The Bureau of Customs (BOC), an attached agency of the Department of Finance, was established in 1902. It is the second largest revenue-collecting agency of the government after the Bureau of Internal Revenue. Republic Act No. 10863, also known as the Customs Modernization and Tariff Act, approved by former President Benigno S. Aquino III on May 30, 2016, is the latest law guiding the mandates and functions of the Bureau. Under this law, the Bureau is responsible for assessing and collecting customs revenues, controlling illicit trade and customs fraud, and facilitating trade. This law also explicitly affirms, in black and white, that the Bureau shall utilize information and communications technology to perform its functions. The CMTA implementation occurred during the time of former President Rodrigo Roa Duterte (PRRD) and several ICT online systems were implemented by BOC to curb graft and corruption and increase the collection of lawful revenues through duties and taxes.
Considering the changes in diplomatic policies, tax laws, and modernization initiatives in PRRD's administration, the team highlighted the following question as the center of this study:
"How did the Philippine importation perform during the Duterte administration?"
This study aims to identify the trends, patterns, and insights regarding the Philippine Importations from 2015 to 2022, especially those years during PRRD's Term, and to determine imported rice consumption patterns of Filipinos between 2015 (before RTL) and 2020 (a year after RTL implementation).
To aid our analysis, the team applied a methodology involving data extraction, data cleaning and processing, and the use of Exploratory Data Analysis (EDA) to guide the team for the questions that would be explored, analyzed, and answered at the end of this study.
The Duterte administration was known for its fight against illegal drugs, criminality, and corruption. The Philippine importations during the first and second half of PRRD's term showed an opposite trend between dutiable value and duties and taxes. From 2016 to 2018, the total dutiable value had an average growth rate of 25% but the total duties and taxes remained flat at 14%. However, from 2019 to 2022, the total dutiable value showed an average decrease of 1%, while the total duties and taxes posted a 19% increase. The increasing and all-time-high breaches in total duties and taxes assessed were mainly due to the appointment of the new BOC Commissioner Guerrero in 2019 amidst the COVID-19 pandemic. Guerrero's implementation of his 10-point priority program that included fully automating the transactions in the frontline and the pandemic made it urgent for everyone to immediately embrace the online filing and processing initiatives of the BOC. This resulted in a notable and remarkable increase in the total duties and tax assessments and a reduction in graft and corruption. Such a turn of events also resulted in another milestone where the total duties and taxes assessed after June 2020 breached 68 billion and never went below it again. This is truly a good example of digitalization initiatives that should be prioritized and capitalized on by the current administration to continue the positive and increasing trend in duties and taxes that the government collects.
When PRRD took office in 2016, he pledged to shift the Philippines away from the United States in favor of China and Russia. However, towards his last two years in office, he had a change of heart that can probably be dated to as early as June 2020 because of the amped-up assertiveness of Beijing in the South China Sea like the incidents involving hundreds of Chinese militia boats.
From 2015 to 2022, importations from China grew massively exceeding the 1 trillion-peso total dutiable value in 2018. It peaked in 2020 but started decreasing since then. On the other hand, the importations from the United States remained stable between the 300 to 500 billion annual total dutiable value. The total dutiable value of importations from Japan was a far second from China reaching only 800 billion pesos in 2020. Importations from Korea is ranked third in total dutiable value since 2015.
The electrical machinery and mineral fuels chapters have consistently bagged the top two spots since 2015. Electrical machinery chapter was largely imported from China so the effective duties and taxes were lower than that of the mineral fuels chapter since the latter had higher duties and excise tax rates, in addition to VAT.
The Rice Tariffication Law (RTL) took effect during PRRD’s administration in March 2019. Analysis of the data showed that rice importations from Viet Nam increased since the implementation of RTL. In 2015, there were 16.85 kilograms of imported rice for every person in the country. This proportion increased to 20.55 kilograms in 2020.
Imported rice from Pakistan and Myanmar has lower importation costs than Viet Nam and Thailand rice. Their average cost is around 20 pesos per kilogram while Viet Nam and Thailand rice are in the 23-25 peso range.
After summarizing the results of the team's thorough analysis of the available importation and census data, the team has arrived at the following recommendations to the BOC: improve data quality by adding stricter and more robust validation checks, evaluate customs memorandum orders carefully, continue implementing and improving existing BOC's modernization projects, and use Application Programming Interface to enable real-time public sharing of accurate BOC reports. The team also highlighted the following items for further studies/analysis: encourage importers to consider importing rice from countries offering lower effective rice costs such as Pakistan and Myanmar, boost trade with countries other than China, focus on the top 10 chapters when considering changes in trade-related government operations and tax law creation/modification since such chapters take at least 70% of the total imports, and finally redirect importations when there are cheaper but same quality alternatives to help drive down the cost of products in the Philippines.
How did the Philippine importation perform during the Duterte administration?
It’s public knowledge that Duterte made a considerable change to foreign policy by re-positioning the Philippines' diplomatic relations to more friendly and positive relations and engagement with China. We would like to know how this affected the importation of goods in the Philippines during the time of former President Rodrigo Roa Duterte (PRRD). Also, certain laws related to importation were implemented during his administration such as the Rice Tariffication Law (RTL). The RTL replaced the quantitative restrictions on imported rice with tariffs of 35 to 40 percent and established the Rice Competitiveness Enhancement Fund (RCEF) funded by the tariff revenues. We would like to know how this has changed rice importation and its relation to the population growth of the Philippines from 2015 to 2020.
The high-level methodology used in this study is as follows:
| No. | Step | Description |
|---|---|---|
| 1. | Data extraction | Obtain Philippine Customs importation data from 2015 to 2022 and Philippine Census data for years 2015 and 2020 from jojie-collected public datasets (directory: /mnt/data/public) and store those in sqlite3 databases. |
| 2. | Data cleaning | Prepare, clean, and process the collected data accordingly to get the relevant data subsets and columns. |
| 3. | Data processing | Remove or add necessary columns for further analysis and create functions in preparation for EDA. |
| 4. | Exploratory Data Analysis (EDA) | Provide a list of objectives and questions that would be explored, analyzed, and answered in the subsequent sections pertaining to trends, patterns, and insights regarding the Philippine Importation. |
The detailed steps performed related to the above methodology is presented in the Data Exploration and Results and Discussion sections of this document.
IMPORTATION DATA:
The source of the importation data is the Bureau of Customs' website's customs processing system called the E2M (Electronic-to-Mobile). It contains downloadable excel files that were scraped and made available to the team via the jojie-collected public datasets (directory: /mnt/data/public/customs) of the Asian Institute of Management (AIM).
The specific data used for this study includes all the monthly excel files from January 2015 to September 2022 which contain 33,293,684 rows and varying numbers of columns depending on the report available. Though there were more columns present in such files, only the following columns were used and considered relevant for this study:
| Column Name | Data Type | Short description |
|---|---|---|
| MONTH_YEAR | TEXT | Assessment month and year based on the filename |
| HS_CODE | TEXT | 11-digit Harmonized System (HS) code. This contains identification codes given to goods for use in international trade |
| COUNTRY_EXPORT | TEXT | Name of the Exporting Country |
| PREFERENTIAL_CODE | TEXT | Code used for preferential treatment of importation based on Free Trade Agreement |
| DUTIABLE_VALUE_FOREIGN | REAL | Financial value of the shipment based on invoice in foreign currency |
| CURRENCY | TEXT | Currency of the dutiable_value_foreign used by the importer |
| EXCHANGE_RATE | REAL | Exchange rate used to convert the foreign value to Philippine Peso |
| DUTIABLE_VALUE_PHP | REAL | Value of the shipment in Philippine Peso |
| DUTIES_AND_TAXES | REAL | System-calculated duties and taxes |
| EXCISE_ADVALOREM_PAID | REAL | System-calculated excise tax |
| VAT_BASE | REAL | The landed cost plus excise taxes, if any |
| VAT_PAID | REAL | System-calculated Value-Added Tax |
| NET_MASS_KGS | REAL | Net weight of the shipment in kilograms |
| GOODS_DESCRIPTION | TEXT | Description of the goods |
| CHAPTER | TEXT | Description of the HS Code |
PHILIPPINE CENSUS:
The source of the census data is the Philippine Statistics Authority. It also contains downloadable excel files that were scraped and made available to the team via the jojie-collected public datasets (directory: /mnt/data/public/customs) of AIM.
For this study, the team selected only the Table A - Population and Annual Growth Rate for the Philippines and its Regions, Provinces, and Highly Urbanized Cities file and used the Philippines' total population data for the years 2015 and 2020.
# Import libraries
import pandas as pd
import numpy as np
from numpy import arange
import sqlite3
import pickle
import re
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import warnings
import plotly.express as px
c_blue = '#0038a8'
c_yellow = '#fcd116'
c_red = '#ce1126'
c_gray = '#9c9a9d'
c_black = '#000000'
c_green = '#3cb043'
pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('mode.chained_assignment', None)
# plt.rcParams['font.family'] = 'sans-serif'
# plt.rcParams['font.sans-serif'] = 'DejaVu Sans'
plt.rcParams['axes.edgecolor'] = '#9c9a9d'
plt.rcParams['axes.linewidth'] = 0.8
plt.rcParams['xtick.color'] = c_black
plt.rcParams['ytick.color'] = c_black
plt.rcParams['grid.linewidth'] = 1
plt.rcParams["figure.figsize"] = (15, 10)
plt.rcParams['figure.dpi'] = 300
warnings.filterwarnings("ignore")
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-a3nueeu4 because the default path (/home/elacson/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
@ticker.FuncFormatter
def billion_formatter(x, pos):
"""Returns formatted values in billions."""
return '{:,.1f} B'.format(x/1E9)
@ticker.FuncFormatter
def million_formatter(x, pos):
"""Returns formatted values in millions."""
return '{:,.1f} M'.format(x/1E6)
@ticker.FuncFormatter
def thousands_formatter(x, pos):
"""Returns formatted values in thousands."""
return '{:,.1f} K'.format(x/1E3)
def display_full(x):
"""Returns a dataframe with customized display settings."""
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 2000)
pd.set_option('display.float_format', '{:20,.2f}'.format)
pd.set_option('display.max_colwidth', None)
display(x)
pd.reset_option('display.max_rows')
pd.reset_option('display.max_columns')
pd.reset_option('display.width')
pd.reset_option('display.float_format')
pd.reset_option('display.max_colwidth')
def highlight_peaks(ax):
"""Returns customized highlights for plots in the specified dates."""
ax.axvspan("2020-06-01", "2020-08-01", color="#fcd116", alpha=0.3)
ax.axvspan("2019-02-01", "2019-04-01", color="#fcd116", alpha=0.3)
ax.axvspan("2016-10-01", "2016-12-01", color="#fcd116", alpha=0.3)
ax.axvspan("2017-06-01", "2017-08-01", color="#fcd116", alpha=0.3)
def outliers():
"""Return a pandas data frame containing outliers removed from the
dataset.
"""
return pd.read_sql("""
SELECT *
FROM outliers
""", conn)
def trends_yearly():
"""Returns yearly importation trend based on dutiable value and duties
and taxes."""
# Create df
df_sum_years = (df_imports_agg.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'), 'HSCODE_2'])
.sum().sort_values(by=['MONTH_YEAR',
'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]).reset_index())
df_sum_years = df_sum_years.merge(df_chapters, on='HSCODE_2')
df_sum_years
# Plot yearly importation based on dutiable value in PHP
fig, ax = plt.subplots(figsize=(15, 20), dpi=200)
df_year = df_sum_years.groupby('MONTH_YEAR').sum().reset_index()
df_year.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
color=c_blue, legend=False, figsize=(15, 10), ax=ax)
marks = df_year[df_year['SUM_DUTIABLE_VALUE_PHP'] > 6e12].iloc[:, :2]
marks.columns = ['MONTH_YEAR', 'EXCEEDS_6_TRILLION']
marks.plot(x='MONTH_YEAR', y='EXCEEDS_6_TRILLION',
legend=False, style='r*', ms=15, ax=ax)
title = """Philippine imports' dutiable value peaked by 2018"""
ax.set_title(title, fontsize=20, color=c_blue)
ax.set_xlabel('Year', fontsize=16)
plt.xticks(fontsize=12)
ax.set_ylabel('Trillion PHP', fontsize=16)
ax.grid(axis='x')
ax.axvspan("2020-04-01", "2023", color=c_yellow, alpha=0.3)
style = dict(size=20, color=c_black)
ax.text('2020-05', 3.5e12, 'Covid-19 Pandemic',
ha='left', zorder=5, **style)
plt.show()
def highlight_peaks(ax):
"""Returns customized highlights for plots in the specified dates."""
ax.axvspan("2020-06-01", "2020-08-01", color="#fcd116", alpha=0.3)
ax.axvspan("2019-02-01", "2019-04-01", color="#fcd116", alpha=0.3)
ax.axvspan("2016-10-01", "2016-12-01", color="#fcd116", alpha=0.3)
ax.axvspan("2017-06-01", "2017-08-01", color="#fcd116", alpha=0.3)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 20), dpi=200)
fig.suptitle("Yearly trend of customs imports", fontsize=24)
#df_year = df_sum_years.groupby('MONTH_YEAR').sum().reset_index()
df_year['DUTES_AND_TAXES_PER_DUTIABLE_VALUE'] = \
df_year['SUM_DUTIES_AND_TAXES']*100/df_year['SUM_DUTIABLE_VALUE_PHP']
df_year.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
color=c_blue, ax=ax1)
df_year.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
color=c_red, ax=ax1)
df_year.plot(x='MONTH_YEAR', y='DUTES_AND_TAXES_PER_DUTIABLE_VALUE',
color=c_blue, ax=ax2)
title = """Massive increase in importation in 2018 did not
translate to duties and taxes.
New tax law rate implementation seen starting 2020"""
ax1.set_title(title, fontsize=18, color=c_blue)
ax1.set_xlabel(None)
ax1.set_ylabel('Trillion PHP', fontsize=16)
ax2.set_xlabel('Year', fontsize=16)
ax2.set_ylabel('% of Duties & Taxes vs Dutiable Value', fontsize=16)
ax1.tick_params(axis="x", labelsize=0)
ax2.tick_params(axis="x", labelsize=12)
ax1.legend([])
ax2.legend([])
ax1.grid(axis='x')
ax2.grid(axis='x')
highlight_peaks(ax1)
highlight_peaks(ax2)
plt.subplots_adjust(wspace=0, hspace=0)
fig.subplots_adjust(top=0.92)
style = dict(size=10, color=c_red)
ax1.text('2015-02', 6e11, 'DUTIES AND TAXES', ha='left', **style)
style = dict(size=10, color=c_blue)
ax1.text('2015-02', 3.3e12, 'DUTIABLE VALUE', ha='left', **style)
text = 'DUTIES AND TAXES PER \nDUTIABLE VALUE'
ax2.text('2015-02', 11.5, text, ha='left', **style)
ax2.set(ylim=(0, 20))
plt.show()
def trends_monthly():
"""Plot monthly trend based on dutiable value, duties & taxes, and
overall monthly trend of duties and taxes over dutiable value.
"""
fig, ax = plt.subplots(figsize=(15, 20), dpi=200)
fig.suptitle("Monthly sum of dutiable value", fontsize=24)
df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
df_month.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
color=c_blue, legend=False, figsize=(15, 10), ax=ax)
marks = df_month[df_month['SUM_DUTIABLE_VALUE_PHP'] > 7e11].iloc[:, :2]
marks.columns = ['MONTH_YEAR', 'EXCEEDS_700_BILLION']
marks.plot(x='MONTH_YEAR', y='EXCEEDS_700_BILLION',
legend=False, style='r*', ms=15, ax=ax)
title = """Total dutiable value exceeds 700B in the last four months,
that threshold was surpassed only four other times before."""
ax.set_title(title, fontsize=20, color=c_blue)
ax.set_xlabel('Year', fontsize=16)
plt.xticks(fontsize=12)
ax.set_ylabel('Trillion PHP', fontsize=16)
ax.grid(axis='x')
ax.axhline(y=7e11, color=c_red, linestyle='--')
ax.axvspan("2020-04-01", "2023", color=c_yellow, alpha=0.3)
style = dict(size=20, color=c_red)
ax.text('2015-02', 7.1e11, '700 Billion PHP', ha='left', zorder=5, **style)
style = dict(size=20, color=c_black)
ax.text('2020-05', 18e10, 'Covid-19 Pandemic',
ha='left', zorder=5, **style)
plt.show()
# Plot by duties and taxes
fig, ax = plt.subplots(figsize=(15, 20), dpi=200)
fig.suptitle("Monthly sum of duties and taxes", fontsize=24)
df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
df_month['SUM_DUTIES_AND_TAXES'] = df_month['SUM_DUTIES_AND_TAXES'] / 1e9
df_month.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
color=c_blue, legend=False, figsize=(15, 10), ax=ax)
title = """
The significant jump in duties and taxes started during the pandemic:
All months after June 2020 are greater than all months before"""
plt.title(title, fontsize=20, color=c_blue)
plt.xlabel('Year', fontsize=16)
plt.xticks(fontsize=12)
plt.ylabel('Billion PHP', fontsize=16)
plt.grid(axis='x')
plt.axvspan("2020-04-01", "2023", color="#fcd116", alpha=0.3)
ax.axhline(y=68, color=c_red, linestyle='--')
style = dict(size=15, color=c_blue)
ax.text('2017-01', 50, 'SUM OF DUTIES AND TAXES', ha='center', **style)
style = dict(size=20, color=c_red)
ax.text('2015-02', 68.5, 'June 2020 (68B PHP)', ha='left', **style)
style = dict(size=20, color=c_black)
ax.text('2020-05', 12, 'Covid-19 Pandemic', ha='left', **style)
plt.show()
# Plot overall monthly trend of duties and taxes over dutiable value
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 20), dpi=200)
fig.suptitle("Monthly trend of customs imports", fontsize=24)
df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
df_month['DUTES_AND_TAXES_PER_DUTIABLE_VALUE'] = \
df_month['SUM_DUTIES_AND_TAXES']*100/df_month['SUM_DUTIABLE_VALUE_PHP']
df_month.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
color=c_blue, ax=ax1)
df_month.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
color=c_red, ax=ax1)
df_month.plot(x='MONTH_YEAR', y='DUTES_AND_TAXES_PER_DUTIABLE_VALUE',
color=c_blue, ax=ax2)
title = """Huge spikes in importation did not translate to duties and taxes,
Items that caused the spikes had tax exemptions"""
ax1.set_title(title, fontsize=24, color=c_blue)
ax1.set_xlabel(None)
ax1.set_ylabel('Trillion PHP', fontsize=16)
ax2.set_xlabel('Year', fontsize=16)
ax2.set_ylabel('Percentage', fontsize=16)
ax1.tick_params(axis="x", labelsize=0)
ax2.tick_params(axis="x", labelsize=12)
ax1.legend([])
ax2.legend([])
ax1.grid(axis='x')
ax2.grid(axis='x')
highlight_peaks(ax1)
highlight_peaks(ax2)
plt.subplots_adjust(wspace=0, hspace=0)
fig.subplots_adjust(top=0.92)
style = dict(size=10, color=c_red)
ax1.text('2015-02', 5e10, 'SUM DUTIES AND TAXES', ha='left', **style)
style = dict(size=10, color=c_blue)
ax1.text('2015-02', 4.5e11, 'SUM DUTIABLE VALUE', ha='left', **style)
text = 'DUTIES AND TAXES PER \nDUTIABLE VALUE'
ax2.text('2015-02', 14, text, ha='left', **style)
ax2.set(ylim=(0, 25))
plt.show()
def prrd_top_bot_chapters():
"""Returns plots of the top 10 and bottom 5 chapters during PRRD's Term.
"""
# Plot Top 10 Chapters using a bar graph based on dutiable value
# Create a df containing importation data during PRRD's term
mask = ((df_imports_agg['MONTH_YEAR'] >= "2016-06-01") &
(df_imports_agg['MONTH_YEAR'] <= "2022-06-01"))
df_prrd = df_imports_agg[mask]
# Save the top & bottom chapters by dutiable value & duties & taxes in PHP
n = 10
df_topn_dv = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
['SUM_DUTIABLE_VALUE_PHP']
.sum()
.sort_values(ascending=False)[:n]
.reset_index())
df_topn_dt = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
['SUM_DUTIES_AND_TAXES']
.sum().sort_values(ascending=False)[:n].reset_index())
n = 5
df_botn_dv = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
['SUM_DUTIABLE_VALUE_PHP']
.sum().sort_values(ascending=True)[:n].reset_index())
df_botn_dt = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
['SUM_DUTIES_AND_TAXES']
.sum().sort_values(ascending=True)[:n].reset_index())
# Set labels
label_top_dv = ('Electrical machinery \nand equipment',
'Mineral fuels, mineral oils',
'Nuclear reactors, boilers',
'Vehicles other than railway \nor tramway rolling-stock',
'Plastics and articles thereof',
'Iron and steel',
'Ores, slag and ash',
'Cereals',
'Optical, photographic, \ncinematographic',
'Pharmaceutical products')
label_top_dt = ('Mineral fuels, mineral oils',
'Electrical machinery \nand equipment',
'Vehicles other than railway \nor tramway rolling-stock',
'Nuclear reactors, boilers',
'Plastics and articles thereof',
'Iron and steel', 'Miscellaneous chemical\n products',
'Cereals', 'Miscellaneous edible \npreparations',
'Ores, slag and ash')
# Based on dutiable value
x = df_topn_dv['CHAPTER']
y = df_topn_dv['SUM_DUTIABLE_VALUE_PHP']
font1 = {'family': 'serif', 'color': c_blue}
fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
fig.suptitle("Top 10 chapters during PRRD's term",
fontsize=24, fontdict=font1)
# Top 10 chapters based on dutiable value
ax.barh(x, y, color=[c_blue, c_yellow, 'lightgray', 'lightgray',
'lightgray', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray'], tick_label=label_top_dv)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Dutiable Value in Trillion PHP', fontsize=16)
ax.set_title("PH imports Chapter 85: Electrical machinery the most, "
"but ...",
fontsize=20, color=c_blue)
ax.yaxis.set_label_position("right")
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.yaxis.tick_right()
plt.show()
# Top 10 chapters based on duties and taxes
fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
x = df_topn_dt['CHAPTER']
y = df_topn_dt['SUM_DUTIES_AND_TAXES']
ax.barh(x, y, color=[c_blue, c_yellow, 'lightgray', 'lightgray',
'lightgray', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray'], tick_label=label_top_dt)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Total duties and taxes in trillion PHP', fontsize=16)
ax.set_title("PH collects more duties and taxes from chapter 27: "
"Mineral fuels", fontsize=20, color=c_blue)
ax.yaxis.set_label_position("right")
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.yaxis.tick_right()
plt.show()
# Plot treemap of chapters based on Dutiable value
# Set labels
label_top_dv = df_topn_dv.HSCODE_2.tolist()
label_top_dt = df_topn_dt.HSCODE_2.tolist()
df_tree_dv = (df_prrd.groupby(['HSCODE_2'])
['SUM_DUTIABLE_VALUE_PHP'].sum().reset_index())
df_tree_dv['TOP'] = np.where(df_tree_dv['HSCODE_2'].isin(label_top_dv),
'Part of Top 10', 'Others')
fig = px.treemap(df_tree_dv, path=['HSCODE_2'],
values='SUM_DUTIABLE_VALUE_PHP', width=1000, height=700,
color='TOP',
color_discrete_map={'Part of Top 10': c_blue,
'Others': c_gray})
fig.update_layout(title=dict(
text='70% of importations based on dutiable value '
'came from the top 10 chapters...',
font=dict(size=18, color=c_blue)),
margin=dict(t=50, l=25, r=25, b=25),
uniformtext_minsize=16, uniformtext_mode='hide')
fig.show()
# Plot treemap of chapters based on Duties & Taxes
df_tree_dt = (df_prrd.groupby(['HSCODE_2'])
['SUM_DUTIES_AND_TAXES'].sum().reset_index())
df_tree_dt['TOP'] = np.where(df_tree_dt['HSCODE_2'].isin(label_top_dt),
'Part of Top 10', 'Others')
fig = px.treemap(df_tree_dt, path=['HSCODE_2'],
values='SUM_DUTIES_AND_TAXES', width=1000, height=700,
color='TOP',
color_discrete_map={'Part of Top 10': c_blue,
'Others': c_gray})
fig.update_layout(title=dict(
text='While 74% based on duties & taxes '
'came from the top 10 chapters',
font=dict(size=18, color=c_blue)),
margin=dict(t=50, l=25, r=25, b=25),
uniformtext_minsize=16, uniformtext_mode='hide')
fig.show()
# Plot bottom chapters
# Based on dutiable value
x = df_botn_dv['CHAPTER']
y = df_botn_dv['SUM_DUTIABLE_VALUE_PHP']
label_bot_dv = ['Vegetable plaiting materials',
'Live trees and other plants',
'Manufactures of straw & \nplaiting materials',
'Cork and articles of cork',
"Works of art, collectors' \npieces and antiques"]
label_bot_dt = ['Vegetable plaiting materials', 'Live animals', 'Silk',
'Live trees and other plants',
'Cork and articles of cork']
font1 = {'family': 'serif', 'color': c_red}
fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
fig.suptitle("Bottom 5 chapters during PRRD's term makes sense...",
fontsize=24, fontdict=font1)
# Bottom 5 chapters based on dutiable value
ax.barh(x, y, color=[c_yellow, c_yellow, 'lightgray', 'lightgray',
'lightgray'], tick_label=label_bot_dv)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Dutiable value in billion PHP', fontsize=16, color=c_red)
ax.set_title("Vegetable plaiting is the least imported product "
"and live trees are highly regulated",
fontsize=20, color=c_red)
ax.yaxis.set_label_position("right")
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.yaxis.tick_right()
plt.show()
# Bottom 5 chapters based on duties and taxes
fig, ax = plt.subplots(figsize=(15, 8), dpi=200)
x = df_botn_dt['CHAPTER']
y = df_botn_dt['SUM_DUTIES_AND_TAXES']
ax.barh(x, y, color=['lightgray', c_yellow, 'lightgray', c_yellow,
'lightgray'], tick_label=label_bot_dt)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Total duties and taxes in 10 Million PHP', fontsize=20,
color=c_red)
ax.set_title("Live animals and trees are in the bottom due to import"
" restrictions", fontsize=20, color=c_red)
ax.yaxis.set_label_position("right")
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.yaxis.tick_right()
plt.show()
def top5_compare_chapters():
"""Returns plots of comparison of top 5 chapters based on dutiable
value against duties & taxes.
"""
# Create DF for top 200 (DUTIABLE VALUE)
df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
topn = 200
df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
# Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
df_top_yr_hs_dv = (df_top_200.reset_index())
df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
'SUM_DUTIABLE_VALUE_PHP']
.rank('dense', ascending=False))
df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
df_rank_15_dv
# Rank by [Month_Year and HSCODE] (DUTY TAXES)
df_top_yr_hs_dt = (df_top_200.reset_index())
df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
'SUM_DUTIES_AND_TAXES']
.rank('dense', ascending=False))
df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
df_rank_15_dt
# Create a list of Top Items (DV & DT)
list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()
# Create a merged DF for identified top15 HSCODES (DV & DT)
df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dv)]
df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dt)]
# Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_yr_hs_dv = (df_imports_merged_dv
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
df_top15_yr_hs_dt = (df_imports_merged_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_top5_sdt = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nlargest(5, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['SUM_DUTIES_AND_TAXES'],
ascending=[False])
.reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']]
df_top5_sdvp = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nlargest(5, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
ascending=[False])
.reset_index()
.set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']])
df_color = df_top5_sdvp.reset_index().drop(
['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS'], axis=1)
colors = {'85': '#0038a8',
'27': '#fcd116',
'84': '#a9a9a9',
'87': '#989898',
'39': '#878787'}
# FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
df_top5_sdvp_plt = df_top5_sdvp.reset_index().drop('HSCODE_2', axis=1)
label1 = ['Top 1: CH85', 'Top 2: CH27',
'Top 3: CH84', 'Top 4: CH87', 'Top 5: CH39']
# FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
df_top5_sdt_plt = df_top5_sdt.reset_index().drop('HSCODE_2', axis=1)
label2 = ['Top 2.5: CH85', 'Top 1: CH27',
'Top 2.5: CH84', 'Top 2: CH87', 'Top 5: CH39']
fig, ax = plt.subplots(1, 2, figsize=(13, 6))
ax[0].pie(x=df_top5_sdvp_plt['SUM_DUTIABLE_VALUE_PHP'], autopct='%1.2f%%',
colors=[
colors[v] for v in df_color['HSCODE_2'].value_counts().keys()])
ax[0].legend(label1, loc='best', bbox_to_anchor=(1, 0.75))
ax[1].pie(x=df_top5_sdvp_plt['SUM_DUTIES_AND_TAXES'], autopct='%1.2f%%',
colors=[
colors[v] for v in df_color['HSCODE_2'].value_counts().keys()])
ax[1].legend(label2, loc='best', bbox_to_anchor=(1, 0.75))
ax[0].set_title('SUM_DUTIABLE_VALUE_PHP', fontsize=8, y=-0.01)
ax[1].set_title('SUM_DUTIES_AND_TAXES_PHP', fontsize=8, y=-0.01)
plt.suptitle(
'Top 5 chapters comparison: dutiable value and duties & taxes',
fontsize=16)
plt.tight_layout()
plt.plot()
def yearly_top20_chapters():
"""Returns a plot of the top 20 chapters."""
# Create DF for top 200 (DUTIABLE VALUE)
df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
topn = 200
df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
# Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
df_top_yr_hs_dv = (df_top_200.reset_index())
df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
'SUM_DUTIABLE_VALUE_PHP']
.rank('dense', ascending=False))
df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
df_rank_15_dv
# Rank by [Month_Year and HSCODE] (DUTY TAXES)
df_top_yr_hs_dt = (df_top_200.reset_index())
df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
'SUM_DUTIES_AND_TAXES']
.rank('dense', ascending=False))
df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
df_rank_15_dt
# Create a list of Top Items (DV & DT)
list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()
# Create a merged DF for identified top15 HSCODES (DV & DT)
df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dv)]
df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dt)]
# Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_yr_hs_dv = (df_imports_merged_dv
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
df_top15_yr_hs_dt = (df_imports_merged_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
# PLOT ALL TOP 15 wrt DUTIABLE VALUE
# Highlight Top 1 and Top 2
ax = df_top15_hs_dv_pvt.drop(columns=['85', '27']).plot(
figsize=(15, 10), color='#9c9a9d', alpha=0.3)
df_top15_hs_dv_pvt['85'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
df_top15_hs_dv_pvt['27'].plot(color='#fcd116', ax=ax, legend=True, lw=5)
style = dict(size=10, color='#fcd116')
ax.text('2017-02', .87E12, ' CHAPTER 27', ha='left', **style)
style = dict(size=10, color='#0038a8')
ax.text('2017-02', 1.55E12, 'CHAPTER 85', ha='left', **style)
plt.suptitle(
'Chapter 85 consistently leads the dutiable value since 2016',
fontsize=18, color=c_blue)
plt.title("Chapter 27, though rank 2 in overall dutiable value, \n"
"has a fluctuating trend with a big dip from 2019 to 2020",
fontsize=16)
plt.show()
# PLOT ALL TOP 15 wrt DUTY TAX
# Highlight Top 1 and Top 2
df_top15_hs_dt_pvt
ax = df_top15_hs_dt_pvt.drop(columns=['85', '27']).plot(
figsize=(15, 10), color='#9c9a9d', alpha=0.3)
df_top15_hs_dt_pvt['85'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
df_top15_hs_dt_pvt['27'].plot(color='#fcd116', ax=ax, legend=True, lw=5)
plt.suptitle(
'On the other hand, chapter 27 consistently leads assessed \n'
'import duties & taxes since 2015', fontsize=18, color=c_blue)
plt.title('Chapter 85 is rank 2 in assessed import duties & taxes',
fontsize=16)
style = dict(size=10, color='#fcd116')
ax.text('2017-02', 1.38E11, 'CHAPTER 27', ha='left', **style)
style = dict(size=10, color='#0038a8')
ax.text('2017-02', .4E11, 'CHAPTER 85', ha='left', **style)
plt.show()
def compare_top2_chapters():
"""Returns plots comparing top 2 chapters based on dutiable value
against duties & taxes."""
# Create DF for top 200 (DUTIABLE VALUE)
df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
topn = 200
df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
# Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
df_top_yr_hs_dv = (df_top_200.reset_index())
df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
'SUM_DUTIABLE_VALUE_PHP']
.rank('dense', ascending=False))
df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
df_rank_15_dv
# Rank by [Month_Year and HSCODE] (DUTY TAXES)
df_top_yr_hs_dt = (df_top_200.reset_index())
df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
'SUM_DUTIES_AND_TAXES']
.rank('dense', ascending=False))
df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
df_rank_15_dt
# Create a list of Top Items (DV & DT)
list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()
# Create a merged DF for identified top15 HSCODES (DV & DT)
df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dv)]
df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_top_hscode_dt)]
# Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_yr_hs_dv = (df_imports_merged_dv
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
df_top15_yr_hs_dt = (df_imports_merged_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 6))
df_top15_hs_dv_pvt['85'].plot(color='#0038a8', ax=ax1, legend=True, lw=5)
df_top15_hs_dv_pvt['27'].plot(color='#fcd116', ax=ax1, legend=True, lw=5)
ax1.legend(loc='upper left')
df_top15_hs_dt_pvt['85'].plot(color='#0038a8', ax=ax2, legend=True, lw=5)
df_top15_hs_dt_pvt['27'].plot(color='#fcd116', ax=ax2, legend=True, lw=5)
plt.legend(loc='upper left')
ax1.set_title(
'Segmentation based on consistent behavior for the 2 chapters. \n '
'Segmented into 2 [2019-2020] & [2020-2022]', fontsize=8)
ax2.set_title('As opposed to the figure on the right, segmentations from '
'behavior is now 3. \n Duties & taxes for CH 85 '
'deviated from normal', fontsize=8)
style = dict(size=10, color='#fcd116')
ax1.text('2015-02', 0.6E12, ' CHAPTER 27', ha='left', **style)
style = dict(size=10, color='#0038a8')
ax1.text('2015-02', 1.4E12, ' CHAPTER 85', ha='left', **style)
style = dict(size=10, color='#fcd116')
ax2.text('2015-02', 01.5E11, ' CHAPTER 27', ha='left', **style)
style = dict(size=10, color='#0038a8')
ax2.text('2015-02', 0.5e11, ' CHAPTER 85', ha='left', **style)
ax1.axvline(x=pd.to_datetime('2019-06-15'), color='k', ls='--')
ax1.axvline(x=pd.to_datetime('2020-06-15'), color='k', ls='--')
ax2.axvline(x=pd.to_datetime('2020-06-15'), color='k', ls='--')
ax2.axvline(x=pd.to_datetime('2021-06-15'), color='k', ls='--')
ax1.axvspan("2019", "2020", color="b", alpha=0.3)
ax1.axvspan("2020", "2022", color="#fcd116", alpha=0.3)
ax2.axvspan("2019", "2020", color="b", alpha=0.3)
ax2.axvspan("2020", "2021", color="r", alpha=0.3)
ax2.axvspan("2021", "2022", color="#fcd116", alpha=0.3)
plt.suptitle(
'There is an opposite trend behavior for chapter 27 in year 2020 to 2021',
fontsize=16, color=c_blue)
plt.tight_layout()
plt.plot()
def bottom5_compare_chapters():
"""Returns a plot of the bottom 5 chapters based on dutiable value
against duties and taxes.
"""
# Create DF for top 200 (DUTIABLE VALUE)
df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
topn = 200
df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
botn = 200
df_bot_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
# Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
df_bot_yr_hs_dv = (df_bot_200.reset_index())
df_bot_yr_hs_dv['rank'] = (df_bot_yr_hs_dv.groupby(['MONTH_YEAR'])[
'SUM_DUTIABLE_VALUE_PHP']
.rank('dense', ascending=True))
df_rank_bot_15_dv = df_bot_yr_hs_dv[df_bot_yr_hs_dv['rank'] <= 15]
# Rank by [Month_Year and HSCODE] (DUTY TAXES)
df_bot_yr_hs_dt = (df_bot_200.reset_index())
df_bot_yr_hs_dt['rank'] = (df_bot_yr_hs_dt.groupby(['MONTH_YEAR'])[
'SUM_DUTIES_AND_TAXES']
.rank('dense', ascending=True))
df_rank_bot_15_dt = df_bot_yr_hs_dt[df_bot_yr_hs_dt['rank'] <= 15]
# Create a list of Bottom Items (DV & DT)
list_bot_hscode_dv = df_rank_bot_15_dv['HSCODE_2'].unique().tolist()
list_bot_hscode_dt = df_rank_bot_15_dt['HSCODE_2'].unique().tolist()
# Create a merged DF for identified top15 HSCODES (DV & DT)
df_imports_merged_bot_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_bot_hscode_dv)]
df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_bot_hscode_dt)]
# Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_bot15_yr_hs_dv = (df_imports_merged_bot_dv
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(15, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(botn, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_bot15_hs_dv_pvt = (df_bot15_yr_hs_dv.reset_index().pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_bot5_sdt = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nsmallest(5, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['SUM_DUTIES_AND_TAXES'],
ascending=[True])
.reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']]
df_bot5_sdvp = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nsmallest(5, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
ascending=[True])
.reset_index()
.set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']])
df_colorb = df_bot5_sdvp.reset_index().drop(
['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS'], axis=1)
df_colorbt = df_bot5_sdt.reset_index().drop(
['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS'], axis=1)
colors = {'14': '#0038a8',
'45': '#fcd116',
'06': c_red,
'46': '#a9a9a9',
'97': '#878787',
'01': '#878787',
'50': '#a9a9a9'}
# FINAL GRAPH COMPARISON
# FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
df_bot5_sdvp_plt = df_bot5_sdvp.reset_index().drop('HSCODE_2', axis=1)
label1 = ['Bot 1: CH14', 'Bot 2: CH06',
'Bot 3: CH46', 'Bot 4: CH45', 'Bot 5: CH97']
# FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
df_bot5_sdt_plt = df_bot5_sdt.reset_index().drop('HSCODE_2', axis=1)
label2 = ['Bot 5: CH14', 'Bot 2: CH01',
'Bot 4: CH50', 'Bot 3: CH45', 'Bot 1: CH06']
fig, ax = plt.subplots(1, 2, figsize=(13, 6))
ax[0].pie(x=df_bot5_sdvp_plt['SUM_DUTIABLE_VALUE_PHP'], autopct='%1.2f%%', colors=[
colors[v] for v in df_colorb['HSCODE_2'].value_counts().keys()])
ax[0].legend(label1, loc='best', bbox_to_anchor=(1, 0.75))
ax[1].pie(x=df_bot5_sdvp_plt['SUM_DUTIES_AND_TAXES'], autopct='%1.2f%%', colors=[
colors[v] for v in df_colorbt['HSCODE_2'].value_counts().keys()])
ax[1].legend(label2, loc='best', bbox_to_anchor=(1, 0.75))
ax[0].set_title('SUM_DUTIABLE_VALUE_PHP', fontsize=8, y=-0.01)
ax[1].set_title('SUM_DUTIES_AND_TAXES', fontsize=8, y=-0.01)
plt.suptitle(
'Bottom 5 chapters comparison: Dutiable value and duties & taxes', fontsize=16)
plt.tight_layout()
plt.plot()
def yearly_bottom5_chapters():
"""Returns plots of yearly bottom 5 chapters based on dutiable value and
duties and taxes.
"""
# Create DF for top 200 (DUTIABLE VALUE)
df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
topn = 200
df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
botn = 200
df_bot_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(topn, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
# Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
df_bot_yr_hs_dv = (df_bot_200.reset_index())
df_bot_yr_hs_dv['rank'] = (df_bot_yr_hs_dv.groupby(['MONTH_YEAR'])[
'SUM_DUTIABLE_VALUE_PHP']
.rank('dense', ascending=True))
df_rank_bot_15_dv = df_bot_yr_hs_dv[df_bot_yr_hs_dv['rank'] <= 15]
# Rank by [Month_Year and HSCODE] (DUTY TAXES)
df_bot_yr_hs_dt = (df_bot_200.reset_index())
df_bot_yr_hs_dt['rank'] = (df_bot_yr_hs_dt.groupby(['MONTH_YEAR'])[
'SUM_DUTIES_AND_TAXES']
.rank('dense', ascending=True))
df_rank_bot_15_dt = df_bot_yr_hs_dt[df_bot_yr_hs_dt['rank'] <= 15]
# Create a list of Bottom Items (DV & DT)
list_bot_hscode_dv = df_rank_bot_15_dv['HSCODE_2'].unique().tolist()
list_bot_hscode_dt = df_rank_bot_15_dt['HSCODE_2'].unique().tolist()
# Create a merged DF for identified top15 HSCODES (DV & DT)
df_imports_merged_bot_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_bot_hscode_dv)]
df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_bot_hscode_dt)]
# Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_bot15_yr_hs_dv = (df_imports_merged_bot_dv
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(15, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
ascending=[True, False]))
df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(botn, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
df_bot15_hs_dv_pvt = (df_bot15_yr_hs_dv.reset_index().pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
df_bot5_sdt = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nsmallest(5, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['SUM_DUTIES_AND_TAXES'],
ascending=[True])
.reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']]
df_bot5_sdvp = (df_imports_merged
.groupby('HSCODE_2')
.sum()
.nsmallest(5, 'SUM_DUTIABLE_VALUE_PHP')
.sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
ascending=[True])
.reset_index()
.set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS']])
df_colorb = df_bot5_sdvp.reset_index().drop(
['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS'], axis=1)
df_colorbt = df_bot5_sdt.reset_index().drop(
['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
'SUM_NET_MASS_KGS'], axis=1)
colors = {'14': '#0038a8',
'45': '#fcd116',
'06': '#3cb043',
'46': '#a9a9a9',
'97': '#878787',
'01': '#878787',
'50': '#a9a9a9'}
# FINAL GRAPH COMPARISON
# FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
df_bot5_sdvp_plt = df_bot5_sdvp.reset_index().drop('HSCODE_2', axis=1)
label1 = ['Bot 1: CH14', 'Bot 2: CH06',
'Bot 3: CH46', 'Bot 4: CH45', 'Bot 5: CH97']
# FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
df_bot5_sdt_plt = df_bot5_sdt.reset_index().drop('HSCODE_2', axis=1)
label2 = ['Bot 5: CH14', 'Bot 4: CH01',
'Bot 3: CH50', 'Bot 2: CH45', 'Bot 1: CH06']
# PLOT ALL TOP 15 wrt DUTIABLE VALUE
# Highlight Top 1 and Top 2
ax = df_bot15_hs_dv_pvt.drop(columns='14').plot(
figsize=(15, 10), color='#9c9a9d', alpha=0.3)
df_bot15_hs_dv_pvt['14'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
style = dict(size=10, color='#0038a8')
ax.text('2018', .8E7, 'CHAPTER 14', ha='left', **style)
style = dict(size=10, color='#3cb043')
ax.text('2016', 66003937, 'X CHAPTER 06', ha='left', **style)
style = dict(size=10, color=c_red)
ax.text('2022', 40623345, 'X CHAPTER 97', ha='left', **style)
plt.legend(loc='upper right')
plt.suptitle('Out of the 5 bottom chapters, it was only chapter 14 \n'
'that was present from 2016 to 2022 based on dutiable value.',
fontsize=18, color=c_blue)
plt.title(
'Chapter 06 & 97 were only present in 2016 and 2022, respectively',
fontsize=16)
plt.show()
# Create DF for Bottom 15 wrt DT
df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
list_bot_hscode_dt)]
# Create DF gb Month-Year & HS
df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
.groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
'HSCODE_2'])
.sum()
.nsmallest(15, 'SUM_DUTIES_AND_TAXES')
.sort_values(by=['MONTH_YEAR',
'SUM_DUTIES_AND_TAXES'],
ascending=[True, False]))
# Create PVT table
df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
values='SUM_DUTIES_AND_TAXES',
index='MONTH_YEAR',
columns='HSCODE_2'))
# Plot 1
ax = df_bot15_hs_dt_pvt.drop(columns=['14', '50']).plot(
figsize=(15, 10), color='#9c9a9d', alpha=0.3)
df_bot15_hs_dt_pvt['14'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
df_bot15_hs_dt_pvt['50'].plot(color='#fcd116', ax=ax, legend=True, lw=5)
style = dict(size=10, color='#0038a8')
ax.text('2017', .46E6, 'CHAPTER 14', ha='left', **style)
style = dict(size=10, color='#fcd116')
ax.text('2016', .4E6, 'CHAPTER 50', ha='left', **style)
plt.legend(loc='upper right')
plt.suptitle('In terms of duties & taxes, instead of chapter 14, \n'
'it was chapter 50 that was present from 2016 to 2023',
fontsize=18, color=c_blue)
plt.title(
'Chapter 14 based on duties & taxes only had values from 2018 to 2020',
fontsize=16)
plt.show()
def prrd_top_countries():
"""Returns plots of top 10 countries by dutiable value and by
duties and taxes."""
# Create data frames for top 10 by country_export
sql = """
SELECT *
FROM summary_year_cntry
"""
df_imports_agg_by_yr_cntry = pd.read_sql(sql, conn)
prrd_country = (df_imports_agg_by_yr_cntry[(
df_imports_agg_by_yr_cntry['MONTH_YEAR'] >= '2016-06-01') &
(df_imports_agg_by_yr_cntry['MONTH_YEAR'] <= '2022-06-30')])
n = 10
SDV_top10 = (prrd_country
.groupby(['COUNTRY_EXPORT'])['SUM_DUTIABLE_VALUE_PHP']
.sum().sort_values(ascending=False)[:n].reset_index())
n = 10
SDT_top10 = (prrd_country
.groupby(['COUNTRY_EXPORT'])['SUM_DUTIES_AND_TAXES']
.sum().sort_values(ascending=False)[:n].reset_index())
# Plot based on Dutiable Value
y = SDV_top10['SUM_DUTIABLE_VALUE_PHP']
x = SDV_top10['COUNTRY_EXPORT']
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
ax.barh(x, y, color=['#0038a8', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray'])
ax.invert_yaxis()
ax.set_xlabel('Dutiable value in trillion PHP')
ax.set_title("China is the top country based on dutiable value "
"during PRRD's term")
ax.yaxis.set_label_position("right")
ax.yaxis.tick_right()
plt.show()
# Plot based on duties and taxes
y = SDT_top10['SUM_DUTIES_AND_TAXES']
x = SDT_top10['COUNTRY_EXPORT']
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
ax.barh(x, y, color=['#0038a8', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray', 'lightgray', 'lightgray',
'lightgray', 'lightgray'])
ax.invert_yaxis()
ax.set_xlabel('Duties and taxes in billion PHP')
ax.set_title("China is also the highest paying country based on duties and "
"taxes during PRRD's term")
ax.yaxis.set_label_position("right")
ax.yaxis.tick_right()
plt.show()
# Show amount & % to total of DV and DT during PRRD's Term
total_SDV = prrd_country['SUM_DUTIABLE_VALUE_PHP'].sum()
total_SDT = prrd_country['SUM_DUTIES_AND_TAXES'].sum()
print(
f"sum of dutiable value during PRRD's Term = {total_SDV/1e12:,.2f} trillion")
print(
f"sum of duties and taxes during PRRD's Term = {total_SDT/1e12:,.2f} trillion")
SDV_top10['% OF TOTAL'] = (
SDV_top10['SUM_DUTIABLE_VALUE_PHP']/total_SDV)*100
display(SDV_top10)
SDT_top10['% OF TOTAL'] = (SDT_top10['SUM_DUTIES_AND_TAXES']/total_SDT)*100
display(SDT_top10)
def yearly_top15_countries_dv():
"""Returns a plot of yearly top 15 countries by dutiable value."""
# Plot top countries by dutiable value
topn = 100
topn_highest = (df_summary_year_cntry.groupby(
[pd.Grouper(key='MONTH_YEAR', freq='Y'),
'COUNTRY_EXPORT'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP'))
topn_highest_yr_cntry = topn_highest.sort_index().reset_index()
top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()
paramlist = r'?'
for i in range(1, len(top_countries)):
paramlist = paramlist + r', ?'
sql = f"""
SELECT substr(month_year,1,4) as YR,
country_export,
sum(SUM_DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
sum(SUM_NET_MASS_KGS) as SUM_NET_MASS_KGS
FROM summary_year_cntry
WHERE country_export in ({paramlist})
GROUP BY YR, country_export
"""
top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='YR',
columns='COUNTRY_EXPORT'))
# Plot
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['UNITED STATES', 'CHINA']
list_cntry = [c for c in top_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)
ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
color=c_red, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'UNITED STATES'], linestyle='dashed',
color=c_green, linewidth=2,
zorder=5)
style = dict(size=10, color=c_red)
ax.text('2015', 550e9, 'CHINA', ha='center',
**style)
style = dict(size=10, color=c_green)
ax.text('2015', 340e9, 'USA', ha='right',
**style)
ax.yaxis.set_major_formatter(billion_formatter)
ax.set_xlabel('YEAR')
ax.set_ylabel('DUTIABLE VALUE (in Billions PHP)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = "China importations way above USA\nimportations from 2015 to 2022."
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.show()
def top15_countries_increases_dv():
"""Returns a plot of top 15 countries based on dutiable value which
highlights only countries that showed an increase in 2020."""
# Plot top countries by dutiable value
topn = 100
topn_highest = (df_summary_year_cntry.groupby(
[pd.Grouper(key='MONTH_YEAR', freq='Y'),
'COUNTRY_EXPORT'])
.sum()
.nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP'))
topn_highest_yr_cntry = topn_highest.sort_index().reset_index()
top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()
paramlist = r'?'
for i in range(1, len(top_countries)):
paramlist = paramlist + r', ?'
sql = f"""
SELECT substr(month_year,1,4) as YR,
country_export,
sum(SUM_DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
sum(SUM_NET_MASS_KGS) as SUM_NET_MASS_KGS
FROM summary_year_cntry
WHERE country_export in ({paramlist})
GROUP BY YR, country_export
"""
top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_DUTIABLE_VALUE_PHP',
index='YR',
columns='COUNTRY_EXPORT'))
# Plot country_export that showed increase in importation
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['CHINA', 'JAPAN', 'KOREA', 'HONG KONG']
list_cntry = [c for c in top_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)
ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
color=c_red, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'JAPAN'],
color=c_yellow, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'KOREA'],
color=c_blue, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'HONG KONG'],
color=c_black, linewidth=2,
zorder=5)
ax.yaxis.set_major_formatter(billion_formatter)
ax.text('2020', 1300e9, 'CHINA', ha='center', zorder=5,
size=10, color=c_red)
ax.text('2020', 900e9, 'JAPAN', ha='center', zorder=4,
size=10, color=c_yellow)
ax.text('2020', 600e9, 'KOREA', ha='center', zorder=3,
size=10, color=c_blue)
ax.text('2020', 400e9, 'HONG KONG', ha='center', zorder=2,
size=10, color=c_black)
ax.set_xlabel('YEAR')
ax.set_ylabel('DUTIABLE VALUE (in Billions PHP)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = """Increased importations from four countries in 2020"""
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.show()
def yearly_top15_countries_dt():
"""Returns a plot of top 15 countries based on duties and taxes."""
# Plot top countries by duties and taxes
topn = 100
topn_highest = (df_summary_year_cntry.groupby(
[pd.Grouper(key='MONTH_YEAR', freq='Y'),
'COUNTRY_EXPORT'])
.sum()
.nlargest(topn, 'SUM_DUTIES_AND_TAXES'))
topn_highest_yr_cntry = topn_highest.sort_index().reset_index()
top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()
paramlist = r'?'
for i in range(1, len(top_countries)):
paramlist = paramlist + r', ?'
sql = f"""
SELECT substr(month_year,1,4) as YR,
country_export,
sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES
FROM summary_year_cntry
WHERE country_export in ({paramlist})
GROUP BY YR, country_export
"""
top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_DUTIES_AND_TAXES',
index='YR',
columns='COUNTRY_EXPORT'))
# Plot
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['UNITED STATES', 'CHINA']
list_cntry = [c for c in top_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)
ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
color=c_red, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'UNITED STATES'], linestyle='dashed',
color=c_green, linewidth=2,
zorder=5)
ax.text('2015', 80e9, 'CHINA', ha='center',
size=10, color=c_red)
ax.text('2015', 25e9, 'USA', ha='right',
size=10, color=c_green)
ax.yaxis.set_major_formatter(billion_formatter)
ax.set_xlabel('YEAR')
ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = """From 2015 to 2022, China import
duties and taxes (DT) are far above of USA."""
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.show()
# Plot top contributors based on duties and taxes
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['CHINA', 'JAPAN', 'KOREA', 'HONG KONG']
list_cntry = [c for c in top_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray,
label=list_cntry)
ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
color=c_red, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'JAPAN'],
color=c_yellow, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'KOREA'],
color=c_blue, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'HONG KONG'],
color=c_black, linewidth=2,
zorder=5)
ax.yaxis.set_major_formatter(billion_formatter)
ax.text('2020', 195e9, 'CHINA', ha='left', zorder=5,
size=10, color=c_red)
ax.text('2020', 100e9, 'JAPAN', ha='left', zorder=4,
size=10, color=c_yellow)
ax.text('2020', 130e9, 'KOREA', ha='left', zorder=3,
size=10, color=c_blue)
ax.text('2020', 35e9, 'HONG KONG', ha='left', zorder=2,
size=10, color=c_black)
ax.set_xlabel('YEAR')
ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
# place a text box in upper left in axes coords
textstr = """Importations from China, Korea and Japan are
the top contributors of Duties and Taxes since 2020"""
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.show()
def rice_volume():
"""Returns a plot of yearly rice importation based on volume, highlighting
the top 2 countries."""
# Plot yearly trend based on volume
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_NETMASS_MT',
index='YR',
columns='COUNTRY_EXPORT'))
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['VIET NAM', 'THAILAND']
list_cntry = [c for c in list_top_rice_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2,
label=list_cntry, color=c_gray)
ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
zorder=5)
ax.text('2015', 1e6, 'VIET NAM', ha='center',
size=10, color=c_blue)
ax.text('2015', 0.5e6, 'THAILAND', ha='center',
size=10, color=c_red)
ax.yaxis.set_major_formatter(million_formatter)
ax.set_xlabel('YEAR')
ax.set_ylabel('VOLUME (in Metric Tons)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = """Since 2019, the volume of rice importation
from Viet Nam more than doubled from 0.8 million MT
to 2.5 million Metric Tons."""
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.grid(axis='y')
plt.show()
def rice_dt():
"""Returns a plot of rice importation based on duties and taxes with
the top 2 countries highlighted."""
# Plot based on duties and taxes
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_DUTIES_AND_TAXES',
index='YR',
columns='COUNTRY_EXPORT'))
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
list_exclude = ['VIET NAM', 'THAILAND']
list_cntry = [c for c in list_top_rice_countries if c not in list_exclude]
ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray,
label=list_cntry)
ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
zorder=5)
ax.text('2022', 14e9, 'VIET NAM', ha='center',
size=10, color=c_blue)
ax.text('2022', 1.5e9, 'THAILAND', ha='center',
size=10, color=c_red)
ax.yaxis.set_major_formatter(billion_formatter)
ax.set_xlabel('YEAR')
ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = """Since 2019, the duties and taxes of rice importation
from Viet Nam increased 2.5 times."""
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
# ax.legend(loc='center right')
plt.grid(axis='y')
plt.show()
def rice_costperkilo():
"""Returns a plot of yearly imported rice's cost per kilo by country."""
# Plot effective cost of rice per kg
top_countries_per_yr = pd.read_sql(
sql, conn, params=list_top_rice_countries)
df_top_cntry_pvt = (top_countries_per_yr.pivot(
values='SUM_DV_DT_PER_KGS',
index='YR',
columns='COUNTRY_EXPORT'))
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
zorder=5)
ax.plot(df_top_cntry_pvt.loc['2019':, 'MYANMAR (former BURMA)'], color=c_black,
linewidth=2, zorder=5)
ax.plot(df_top_cntry_pvt.loc['2017':, 'PAKISTAN'], color=c_yellow, linewidth=2,
zorder=5)
ax.text('2015', 20.2, 'VIET NAM', ha='center',
size=10, color=c_blue)
ax.text('2015', 21.3, 'THAILAND', ha='center',
size=10, color=c_red)
ax.text('2017', 21.8, 'PAKISTAN', ha='left',
size=10, color=c_yellow)
ax.text('2019', 19.7, 'MYANMAR', ha='left',
size=10, color=c_black)
ax.set_xlabel('YEAR')
ax.set_ylabel('IMPORTATION COST PER KILOGRAM (in PHP)')
# place a text box in upper left in axes coords
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
textstr = """The cost of rice importation ranged
between 20 to 28 pesos per kilogram."""
ax.text(0.05, 0.85, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', bbox=props)
plt.grid(axis='y')
plt.show()
def rice_per_capita():
"""Returns a plot of yearly trend in rice volume vs. the change
in population between 2015 and 2020."""
# Plot
df_annual_rice_import = top_countries_per_yr.groupby(
'YR').sum().reset_index()
df_annual_rice_import['DT_PER_MT'] = (df_annual_rice_import['SUM_DUTIES_AND_TAXES'] /
df_annual_rice_import['SUM_NETMASS_MT'])
df_annual_rice_import.set_index(['YR'], inplace=True)
# Add the Population fields
df_popn = pd.melt(df_census.query("LOCATION == 'PHILIPPINES'"),
id_vars='LOCATION', value_name='POPULATION',
var_name='YEAR').drop('LOCATION', axis=1)
df_popn['YEAR'] = df_popn['YEAR'].str[-6:-2]
df_popn.rename(columns={'YEAR': 'YR'}, inplace=True)
df_rice_popn = pd.merge(df_annual_rice_import,
df_popn, how='left', on='YR')
df_rice_popn.set_index(['YR'], inplace=True)
df_rice_popn['KGS_PER_POPN'] = (df_rice_popn['SUM_NETMASS_MT'] * 1000 /
df_rice_popn['POPULATION'])
fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
ax.plot(df_rice_popn['SUM_NETMASS_MT'], linewidth=2,
label='Rice Volume', zorder=1)
ax2 = ax.twinx()
ax2 = df_rice_popn['POPULATION'].plot(kind='bar', linewidth=2,
secondary_y=True, zorder=2,
label='Population', color=c_yellow,
alpha=0.5)
ax.yaxis.set_major_formatter(million_formatter)
ax2.yaxis.set_major_formatter(million_formatter)
ax.set_xlabel('YEAR')
ax.set_ylabel('VOLUME (in Metric Tons)')
ax2.set_ylabel('POPULATION')
ax.set_ylim(0, 4e6)
ax2.set_ylim(100e6, 110e6)
ax2.legend(loc='upper right')
ax.legend(loc='upper left')
plt.grid(axis='y')
plt.show()
The customs and census datasets were obtained via a jojie-collected public dataset (directories: /mnt/data/public/customs and /mnt/data/public/census).
The raw data are all in xlsx file format which were saved to pickle files and sqlite3 databases.
The detailed steps performed, including the relevant documents used and created are documented below:
| Jupyter Notebooks | Pickle Files created |
|---|---|
| customs-sqllite-2015.ipynb | 2015_imports.pkl |
| customs-sqllite-2016.ipynb | 2016_imports.pkl |
| customs-sqllite-2017.ipynb | 2017_imports.pkl |
| customs-sqllite-2018.ipynb | 2018_imports.pkl |
| customs-sqllite-2019.ipynb | 2019_imports.pkl |
| customs-sqllite-2020.ipynb | 2020_imports.pkl |
| customs-sqllite-2021.ipynb | 2021_imports.pkl |
| customs-sqllite-2022.ipynb | 2022_imports.pkl |
| Jupyter Notebook | Excel File created |
|---|---|
| colname_extraction.ipynb | Column_Dict.xlsx |
| Jupyter Notebooks | SQLite3 Databases created |
|---|---|
| customs-create-sqllite-2015.ipynb | 2015-customs.db |
| customs-create-sqllite-2016.ipynb | 2016-customs.db |
| customs-create-sqllite-2017.ipynb | 2017-customs.db |
| customs-create-sqllite-2018.ipynb | 2018-customs.db |
| customs-create-sqllite-2019.ipynb | 2019-customs.db |
| customs-create-sqllite-2020.ipynb | 2020-customs.db |
| customs-create-sqllite-2021.ipynb | 2021-customs.db |
| customs-create-sqllite-2022.ipynb | 2022-customs.db |
| Jupyter Notebook | SQLite3 database created | Excel file created |
|---|---|---|
| census-create-sqllite.ipynb | census.db | census.xlsx |
| Jupyter Notebook | SQLite3 database created | Excel file created |
|---|---|---|
| imports_combined.ipynb | imports_combined.db | imports_rowcounts.xlsx |
Step 6: Inserted Reference Tables (Reference-Tables.ipynb) to 'imports_combined.db' database.
Step 7: Place the above files in the same directory.
# Connect to SQL DB containing Philippine importation and census data
conn = sqlite3.connect('imports_combined.db')
conn2 = sqlite3.connect('census.db')
# Read the sql query below and save it to a data frame
# sql = """
# CREATE TABLE summary AS
# SELECT MONTH_YEAR,
# substr('0000000000'||HS_CODE, -11, 11) as HS_CODE,
# PREFERENTIAL_CODE,
# SUM(DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
# SUM(VAT_BASE) as SUM_VAT_BASE,
# SUM(VAT_PAID) as SUM_VAT_PAID,
# SUM(DUTY_PAID) as SUM_DUTY_PAID,
# SUM(DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
# SUM(EXCISE_ADVALOREM_PAID) as SUM_EXCISE_ADVALOREM_PAID,
# SUM(NET_MASS_KGS) as SUM_NET_MASS_KGS,
# AVG(DUTIABLE_VALUE_PHP) as AVG_DUTIABLE_VALUE_PHP,
# AVG(VAT_BASE) as AVG_VAT_BASE,
# AVG(VAT_PAID) as AVG_VAT_PAID,
# AVG(DUTY_PAID) as AVG_DUTY_PAID,
# AVG(DUTIES_AND_TAXES) as AVG_DUTIES_AND_TAXES,
# AVG(EXCISE_ADVALOREM_PAID) as AVG_EXCISE_ADVALOREM_PAID,
# AVG(NET_MASS_KGS) as AVG_NET_MASS_KGS,
# MIN(DUTIABLE_VALUE_PHP) as MIN_DUTIABLE_VALUE_PHP,
# MIN(VAT_BASE) as MIN_VAT_BASE,
# MIN(VAT_PAID) as MIN_VAT_PAID,
# MIN(DUTY_PAID) as MIN_DUTY_PAID,
# MIN(DUTIES_AND_TAXES) as MIN_DUTIES_AND_TAXES,
# MIN(EXCISE_ADVALOREM_PAID) as MIN_EXCISE_ADVALOREM_PAID,
# MIN(NET_MASS_KGS) as MIN_NET_MASS_KGS,
# MAX(DUTIABLE_VALUE_PHP) as MAX_DUTIABLE_VALUE_PHP,
# MAX(VAT_BASE) as MAX_VAT_BASE,
# MAX(VAT_PAID) as MAX_VAT_PAID,
# MAX(DUTY_PAID) as MAX_DUTY_PAID,
# MAX(DUTIES_AND_TAXES) as MAX_DUTIES_AND_TAXES,
# MAX(EXCISE_ADVALOREM_PAID) as MAX_EXCISE_ADVALOREM_PAID,
# MAX(NET_MASS_KGS) as MAX_NET_MASS_KGS,
# COUNT(*) as COUNT_ROWS
# FROM imports
# GROUP BY MONTH_YEAR, HS_CODE, PREFERENTIAL_CODE
# """
# df_imports_agg = pd.read_sql(sql, conn)
# Add country_export to the db
# sql = """
# CREATE TABLE summary_year_cntry AS
# SELECT MONTH_YEAR,
# COUNTRY_EXPORT,
# SUM(DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
# SUM(VAT_BASE) as SUM_VAT_BASE,
# SUM(VAT_PAID) as SUM_VAT_PAID,
# SUM(DUTY_PAID) as SUM_DUTY_PAID,
# SUM(DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
# SUM(EXCISE_ADVALOREM_PAID) as SUM_EXCISE_ADVALOREM_PAID,
# SUM(NET_MASS_KGS) as SUM_NET_MASS_KGS,
# AVG(DUTIABLE_VALUE_PHP) as AVG_DUTIABLE_VALUE_PHP,
# AVG(VAT_BASE) as AVG_VAT_BASE,
# AVG(VAT_PAID) as AVG_VAT_PAID,
# AVG(DUTY_PAID) as AVG_DUTY_PAID,
# AVG(DUTIES_AND_TAXES) as AVG_DUTIES_AND_TAXES,
# AVG(EXCISE_ADVALOREM_PAID) as AVG_EXCISE_ADVALOREM_PAID,
# AVG(NET_MASS_KGS) as AVG_NET_MASS_KGS,
# MIN(DUTIABLE_VALUE_PHP) as MIN_DUTIABLE_VALUE_PHP,
# MIN(VAT_BASE) as MIN_VAT_BASE,
# MIN(VAT_PAID) as MIN_VAT_PAID,
# MIN(DUTY_PAID) as MIN_DUTY_PAID,
# MIN(DUTIES_AND_TAXES) as MIN_DUTIES_AND_TAXES,
# MIN(EXCISE_ADVALOREM_PAID) as MIN_EXCISE_ADVALOREM_PAID,
# MIN(NET_MASS_KGS) as MIN_NET_MASS_KGS,
# MAX(DUTIABLE_VALUE_PHP) as MAX_DUTIABLE_VALUE_PHP,
# MAX(VAT_BASE) as MAX_VAT_BASE,
# MAX(VAT_PAID) as MAX_VAT_PAID,
# MAX(DUTY_PAID) as MAX_DUTY_PAID,
# MAX(DUTIES_AND_TAXES) as MAX_DUTIES_AND_TAXES,
# MAX(EXCISE_ADVALOREM_PAID) as MAX_EXCISE_ADVALOREM_PAID,
# MAX(NET_MASS_KGS) as MAX_NET_MASS_KGS,
# COUNT(*) as COUNT_ROWS
# FROM imports
# GROUP BY MONTH_YEAR, COUNTRY_EXPORT
# ORDER BY MONTH_YEAR, COUNTRY_EXPORT
# """
# conn.execute(sql)
# Save the file to a pickle file for shorter processing time when we access
# # the same data moving forward
# df_imports_agg.to_pickle('df_imports_agg.pkl')
# Remove commas in country_export column
# sql = """
# UPDATE summary
# SET country_export = substr(country_export, 1, instr(country_export,',')-1)
# WHERE country_export like '%,%'
# """
# conn.execute(sql)
# Load pickle file, read-only
with open('df_imports_agg.pkl', 'rb') as file:
df_imports_agg = pickle.load(file)
# Preview data
display(df_imports_agg.head())
# Get the dimensions of the df
print(f'Number of rows: {df_imports_agg.shape[0]}\n'
f'Number of columns: {df_imports_agg.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
print(df_imports_agg.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_imports_agg.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_imports_agg.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(df_imports_agg.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(df_imports_agg.isna().sum() / (len(df_imports_agg))*100)
| MONTH_YEAR | HS_CODE | PREFERENTIAL_CODE | SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | ... | MIN_EXCISE_ADVALOREM_PAID | MIN_NET_MASS_KGS | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | MAX_NET_MASS_KGS | COUNT_ROWS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-1 | 10019019000 | None | 2,413,111,061.00 | 2,440,378,069.00 | 0.00 | 0.00 | 0.00 | NaN | 153,329,722.00 | ... | NaN | 1000787.0 | 369,253,676.00 | 372,889,558.00 | 0.00 | 0.00 | 0.00 | NaN | 8006296.0 | 20 |
| 1 | 2015-1 | 10019099000 | None | 1,448,184,761.00 | 1,571,805,522.00 | 0.00 | 101,372,917.00 | 101,372,917.00 | NaN | 125,329,720.00 | ... | NaN | 0.0 | 216,549,708.00 | 234,706,251.00 | 0.00 | 15,158,479.00 | 15,158,479.00 | NaN | 9460325.0 | 32 |
| 2 | 2015-1 | 10019099000 | AIFTA | 18,672,091.00 | 19,794,256.00 | 0.00 | 746,881.00 | 746,881.00 | NaN | 2,005,240.00 | ... | NaN | 229540.0 | 6,410,765.00 | 6,789,302.00 | 0.00 | 256,430.00 | 256,430.00 | NaN | 691410.0 | 5 |
| 3 | 2015-1 | 10019099000 | ANFTA | 262,946,606.00 | 266,354,909.00 | 0.00 | 0.00 | 0.00 | NaN | 20,094,214.00 | ... | NaN | 16129544.0 | 206,744,629.00 | 209,332,028.00 | 0.00 | 0.00 | 0.00 | NaN | 99390.0 | 13 |
| 4 | 2015-1 | 10019919000 | None | 77,393,616.00 | 78,178,366.00 | 9,381,401.00 | 0.00 | 9,381,401.00 | NaN | 4,470,871.00 | ... | NaN | 406149.0 | 17,263,204.00 | 17,433,569.00 | 2,092,028.00 | 0.00 | 2,092,028.00 | NaN | 989386.0 | 7 |
5 rows × 32 columns
Number of rows: 1324823 Number of columns: 32 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 1324823 entries, 0 to 1324822 Data columns (total 32 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MONTH_YEAR 1324823 non-null object 1 HS_CODE 1324823 non-null object 2 PREFERENTIAL_CODE 671722 non-null object 3 SUM_DUTIABLE_VALUE_PHP 1324823 non-null float64 4 SUM_VAT_BASE 1324823 non-null float64 5 SUM_VAT_PAID 1324823 non-null float64 6 SUM_DUTY_PAID 1324823 non-null float64 7 SUM_DUTIES_AND_TAXES 1324823 non-null float64 8 SUM_EXCISE_ADVALOREM_PAID 1105652 non-null float64 9 SUM_NET_MASS_KGS 1324806 non-null float64 10 AVG_DUTIABLE_VALUE_PHP 1324823 non-null float64 11 AVG_VAT_BASE 1324823 non-null float64 12 AVG_VAT_PAID 1324823 non-null float64 13 AVG_DUTY_PAID 1324823 non-null float64 14 AVG_DUTIES_AND_TAXES 1324823 non-null float64 15 AVG_EXCISE_ADVALOREM_PAID 1105652 non-null float64 16 AVG_NET_MASS_KGS 1324806 non-null float64 17 MIN_DUTIABLE_VALUE_PHP 1324823 non-null float64 18 MIN_VAT_BASE 1324823 non-null float64 19 MIN_VAT_PAID 1324823 non-null float64 20 MIN_DUTY_PAID 1324823 non-null float64 21 MIN_DUTIES_AND_TAXES 1324823 non-null float64 22 MIN_EXCISE_ADVALOREM_PAID 1105652 non-null float64 23 MIN_NET_MASS_KGS 1324806 non-null object 24 MAX_DUTIABLE_VALUE_PHP 1324823 non-null float64 25 MAX_VAT_BASE 1324823 non-null float64 26 MAX_VAT_PAID 1324823 non-null float64 27 MAX_DUTY_PAID 1324823 non-null float64 28 MAX_DUTIES_AND_TAXES 1324823 non-null float64 29 MAX_EXCISE_ADVALOREM_PAID 1105652 non-null float64 30 MAX_NET_MASS_KGS 1324806 non-null object 31 COUNT_ROWS 1324823 non-null int64 dtypes: float64(26), int64(1), object(5) memory usage: 323.4+ MB None Summary statistics for numerical columns:
| SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | AVG_DUTIABLE_VALUE_PHP | AVG_VAT_BASE | AVG_VAT_PAID | ... | MIN_DUTY_PAID | MIN_DUTIES_AND_TAXES | MIN_EXCISE_ADVALOREM_PAID | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | COUNT_ROWS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,105,652.00 | 1,324,806.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | ... | 1,324,823.00 | 1,324,823.00 | 1,105,652.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,324,823.00 | 1,105,652.00 | 1,324,823.00 |
| mean | 34,929,634.89 | 36,822,070.21 | 3,633,220.59 | 626,018.18 | 4,113,051.56 | 677,901.78 | 717,764.97 | 2,624,132.48 | 2,748,433.15 | 366,429.17 | ... | 16,009.30 | 143,363.82 | 17,917.48 | 8,728,100.18 | 9,340,447.21 | 949,754.37 | 158,984.01 | 1,025,553.28 | 174,129.59 | 25.13 |
| std | 732,322,462.07 | 775,984,732.53 | 79,093,379.28 | 31,437,123.11 | 98,258,365.54 | 36,661,977.61 | 19,769,234.28 | 47,541,731.53 | 49,670,521.35 | 6,466,321.70 | ... | 423,020.96 | 2,594,409.27 | 942,233.57 | 570,526,018.53 | 615,282,651.97 | 57,513,699.75 | 30,281,140.47 | 71,766,654.04 | 18,925,504.66 | 126.76 |
| min | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
| 25% | 166,852.50 | 179,316.38 | 24,841.60 | 0.00 | 8,251.00 | 0.00 | 377.12 | 59,091.00 | 64,159.46 | 8,235.97 | ... | 0.00 | 0.00 | 0.00 | 115,299.68 | 123,736.50 | 17,635.74 | 0.00 | 6,465.77 | 0.00 | 1.00 |
| 50% | 1,668,926.68 | 1,756,291.00 | 191,009.00 | 3.00 | 149,662.00 | 0.00 | 7,460.18 | 304,490.32 | 322,282.50 | 37,755.95 | ... | 0.00 | 799.16 | 0.00 | 882,473.00 | 924,721.00 | 105,762.00 | 3.00 | 87,547.00 | 0.00 | 3.00 |
| 75% | 9,419,254.94 | 9,873,365.88 | 1,012,921.25 | 56,728.00 | 914,510.50 | 0.00 | 52,630.51 | 1,120,654.80 | 1,163,528.08 | 140,875.29 | ... | 124.00 | 20,126.82 | 0.00 | 3,424,216.00 | 3,570,642.91 | 402,865.00 | 28,132.89 | 357,465.67 | 0.00 | 13.00 |
| max | 384,390,862,552.01 | 384,875,563,099.94 | 46,184,983,855.11 | 34,208,739,859.72 | 65,715,485,655.68 | 14,353,591,500.00 | 3,410,091,691.00 | 31,593,760,764.57 | 33,929,375,520.14 | 4,294,389,926.00 | ... | 139,739,494.00 | 648,718,679.46 | 210,221,920.00 | 242,616,739,570.34 | 262,056,064,649.71 | 31,446,727,757.96 | 34,144,111,833.19 | 65,590,839,591.15 | 13,499,424,000.00 | 13,505.00 |
8 rows × 27 columns
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| MONTH_YEAR | 1324823 | 93 | 2015-9 | 132186 |
| HS_CODE | 1324823 | 17564 | 84799040000 | 3623 |
| PREFERENTIAL_CODE | 671722 | 90244 | ACFTA | 289827 |
| MIN_NET_MASS_KGS | 1324806 | 196538 | 0.01 | 84932 |
| MAX_NET_MASS_KGS | 1324806 | 229072 | 1.0 | 12932 |
Check for null columns:
MONTH_YEAR 0 HS_CODE 0 PREFERENTIAL_CODE 653101 SUM_DUTIABLE_VALUE_PHP 0 SUM_VAT_BASE 0 SUM_VAT_PAID 0 SUM_DUTY_PAID 0 SUM_DUTIES_AND_TAXES 0 SUM_EXCISE_ADVALOREM_PAID 219171 SUM_NET_MASS_KGS 17 AVG_DUTIABLE_VALUE_PHP 0 AVG_VAT_BASE 0 AVG_VAT_PAID 0 AVG_DUTY_PAID 0 AVG_DUTIES_AND_TAXES 0 AVG_EXCISE_ADVALOREM_PAID 219171 AVG_NET_MASS_KGS 17 MIN_DUTIABLE_VALUE_PHP 0 MIN_VAT_BASE 0 MIN_VAT_PAID 0 MIN_DUTY_PAID 0 MIN_DUTIES_AND_TAXES 0 MIN_EXCISE_ADVALOREM_PAID 219171 MIN_NET_MASS_KGS 17 MAX_DUTIABLE_VALUE_PHP 0 MAX_VAT_BASE 0 MAX_VAT_PAID 0 MAX_DUTY_PAID 0 MAX_DUTIES_AND_TAXES 0 MAX_EXCISE_ADVALOREM_PAID 219171 MAX_NET_MASS_KGS 17 COUNT_ROWS 0 dtype: int64
Display % of null columns:
MONTH_YEAR 0.00 HS_CODE 0.00 PREFERENTIAL_CODE 49.30 SUM_DUTIABLE_VALUE_PHP 0.00 SUM_VAT_BASE 0.00 SUM_VAT_PAID 0.00 SUM_DUTY_PAID 0.00 SUM_DUTIES_AND_TAXES 0.00 SUM_EXCISE_ADVALOREM_PAID 16.54 SUM_NET_MASS_KGS 0.00 AVG_DUTIABLE_VALUE_PHP 0.00 AVG_VAT_BASE 0.00 AVG_VAT_PAID 0.00 AVG_DUTY_PAID 0.00 AVG_DUTIES_AND_TAXES 0.00 AVG_EXCISE_ADVALOREM_PAID 16.54 AVG_NET_MASS_KGS 0.00 MIN_DUTIABLE_VALUE_PHP 0.00 MIN_VAT_BASE 0.00 MIN_VAT_PAID 0.00 MIN_DUTY_PAID 0.00 MIN_DUTIES_AND_TAXES 0.00 MIN_EXCISE_ADVALOREM_PAID 16.54 MIN_NET_MASS_KGS 0.00 MAX_DUTIABLE_VALUE_PHP 0.00 MAX_VAT_BASE 0.00 MAX_VAT_PAID 0.00 MAX_DUTY_PAID 0.00 MAX_DUTIES_AND_TAXES 0.00 MAX_EXCISE_ADVALOREM_PAID 16.54 MAX_NET_MASS_KGS 0.00 COUNT_ROWS 0.00 dtype: float64
# Extract importations data summary by year and country
sql = """
SELECT *
FROM summary_year_cntry
"""
df_summary_year_cntry = pd.read_sql(sql, conn)
df_summary_year_cntry['MONTH_YEAR'] = (pd.to_datetime(
df_summary_year_cntry['MONTH_YEAR'],
format='%Y-%m'))
# Preview data
display(df_summary_year_cntry.head())
# Get the dimensions of the df
print(f'Number of rows: {df_summary_year_cntry.shape[0]}\n'
f'Number of columns: {df_summary_year_cntry.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
display(df_summary_year_cntry.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_summary_year_cntry.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_summary_year_cntry.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(df_summary_year_cntry.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(df_summary_year_cntry.isna().sum() / (len(df_summary_year_cntry))*100)
| MONTH_YEAR | COUNTRY_EXPORT | SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | AVG_DUTIABLE_VALUE_PHP | ... | MIN_EXCISE_ADVALOREM_PAID | MIN_NET_MASS_KGS | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | MAX_NET_MASS_KGS | COUNT_ROWS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 | None | 1,064,784,798.00 | 1,095,264,620.00 | 129,498,498.00 | 19,800,478.00 | 221,626,576.00 | 342,464.00 | 20,482,566.33 | 570,930.19 | ... | 0.00 | 0.01 | 274,480,527.00 | 276,138,966.00 | 33,136,675.00 | 4,554,768.00 | 33,136,675.00 | 102,757.00 | 9892.8 | 1865 |
| 1 | 2015-01-01 | ARGENTINA | 1,528,122,757.00 | 1,651,336,941.00 | 11,609,722.00 | 108,126,300.00 | 119,925,877.00 | 181,739.00 | 83,521,009.88 | 23,153,375.11 | ... | 4.00 | 1.0 | 559,224,626.00 | 563,529,971.00 | 970,516.00 | 67,755,299.00 | 67,755,299.00 | 180,835.00 | 99730.0 | 66 |
| 2 | 2015-01-01 | AUSTRALIA | 3,762,129,469.00 | 3,867,970,414.00 | 280,259,798.00 | 47,513,309.00 | 335,666,913.00 | 7,701,001.00 | 174,863,099.36 | 2,032,484.86 | ... | 0.00 | 0.0 | 206,744,629.00 | 209,332,028.00 | 10,418,823.00 | 2,456,098.00 | 10,418,823.00 | 4,194,983.00 | 999.0 | 1851 |
| 3 | 2015-01-01 | AUSTRIA | 133,524,148.00 | 138,460,596.00 | 13,736,575.00 | 3,875,369.00 | 17,620,595.00 | 8,651.00 | 1,044,444.57 | 809,237.26 | ... | 8,651.00 | 0.0 | 17,372,400.00 | 17,584,537.00 | 2,110,144.00 | 372,951.00 | 2,283,868.00 | 8,651.00 | 96.93 | 165 |
| 4 | 2015-01-01 | Andorra | 606,871.00 | 705,233.00 | 84,628.00 | 91,030.00 | 175,658.00 | NaN | 890.00 | 606,871.00 | ... | NaN | 890.0 | 606,871.00 | 705,233.00 | 84,628.00 | 91,030.00 | 175,658.00 | NaN | 890.0 | 1 |
5 rows × 31 columns
Number of rows: 12488 Number of columns: 31 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 12488 entries, 0 to 12487 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MONTH_YEAR 12488 non-null datetime64[ns] 1 COUNTRY_EXPORT 12415 non-null object 2 SUM_DUTIABLE_VALUE_PHP 12488 non-null float64 3 SUM_VAT_BASE 12488 non-null float64 4 SUM_VAT_PAID 12488 non-null float64 5 SUM_DUTY_PAID 12488 non-null float64 6 SUM_DUTIES_AND_TAXES 12488 non-null float64 7 SUM_EXCISE_ADVALOREM_PAID 11761 non-null float64 8 SUM_NET_MASS_KGS 12488 non-null float64 9 AVG_DUTIABLE_VALUE_PHP 12488 non-null float64 10 AVG_VAT_BASE 12488 non-null float64 11 AVG_VAT_PAID 12488 non-null float64 12 AVG_DUTY_PAID 12488 non-null float64 13 AVG_DUTIES_AND_TAXES 12488 non-null float64 14 AVG_EXCISE_ADVALOREM_PAID 11761 non-null float64 15 AVG_NET_MASS_KGS 12488 non-null float64 16 MIN_DUTIABLE_VALUE_PHP 12488 non-null float64 17 MIN_VAT_BASE 12488 non-null float64 18 MIN_VAT_PAID 12488 non-null float64 19 MIN_DUTY_PAID 12488 non-null float64 20 MIN_DUTIES_AND_TAXES 12488 non-null float64 21 MIN_EXCISE_ADVALOREM_PAID 11761 non-null float64 22 MIN_NET_MASS_KGS 12488 non-null object 23 MAX_DUTIABLE_VALUE_PHP 12488 non-null float64 24 MAX_VAT_BASE 12488 non-null float64 25 MAX_VAT_PAID 12488 non-null float64 26 MAX_DUTY_PAID 12488 non-null float64 27 MAX_DUTIES_AND_TAXES 12488 non-null float64 28 MAX_EXCISE_ADVALOREM_PAID 11761 non-null float64 29 MAX_NET_MASS_KGS 12488 non-null object 30 COUNT_ROWS 12488 non-null int64 dtypes: datetime64[ns](1), float64(26), int64(1), object(3) memory usage: 3.0+ MB
None
Summary statistics for numerical columns:
| SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | AVG_DUTIABLE_VALUE_PHP | AVG_VAT_BASE | AVG_VAT_PAID | ... | MIN_DUTY_PAID | MIN_DUTIES_AND_TAXES | MIN_EXCISE_ADVALOREM_PAID | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | COUNT_ROWS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 12,488.00 | 12,488.00 | 12,488.00 | 12,488.00 | 12,488.00 | 11,761.00 | 12,488.00 | 12,488.00 | 12,488.00 | 12,488.00 | ... | 12,488.00 | 12,488.00 | 11,761.00 | 12,488.00 | 12,488.00 | 12,488.00 | 12,488.00 | 12,488.00 | 11,761.00 | 12,488.00 |
| mean | 3,705,604,074.30 | 3,906,368,155.09 | 385,439,957.98 | 66,412,819.46 | 436,344,115.05 | 63,729,568.45 | 76,145,046.43 | 11,067,629.26 | 11,460,454.62 | 1,084,135.69 | ... | 46,794.87 | 174,420.47 | 13,888.12 | 407,039,667.15 | 447,962,151.29 | 38,188,646.19 | 7,884,036.28 | 45,774,547.80 | 13,682,083.79 | 2,666.05 |
| std | 14,095,020,333.99 | 14,816,430,982.05 | 1,592,111,110.49 | 375,977,478.03 | 1,811,322,871.89 | 388,016,312.84 | 284,943,650.29 | 79,693,933.75 | 80,747,503.75 | 8,723,205.39 | ... | 1,205,425.25 | 4,753,053.89 | 574,955.84 | 5,495,077,973.95 | 5,977,250,806.31 | 571,316,830.74 | 311,814,904.01 | 721,736,953.70 | 183,076,557.99 | 11,309.06 |
| min | 10.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 10.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 10.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
| 25% | 2,867,075.75 | 3,136,836.50 | 105,160.00 | 47,028.75 | 137,810.18 | 0.00 | 18,000.00 | 430,834.33 | 461,284.43 | 18,977.55 | ... | 0.00 | 0.00 | 0.00 | 1,578,494.50 | 1,698,414.01 | 72,122.35 | 31,036.47 | 91,853.75 | 0.00 | 3.00 |
| 50% | 41,182,518.70 | 43,836,093.50 | 2,755,241.85 | 1,007,581.50 | 3,400,088.35 | 0.00 | 357,861.30 | 1,229,490.22 | 1,304,991.08 | 84,159.39 | ... | 0.00 | 0.00 | 0.00 | 11,136,321.50 | 11,703,739.00 | 864,218.02 | 334,274.50 | 1,100,676.21 | 0.00 | 20.00 |
| 75% | 902,908,137.33 | 960,135,889.46 | 73,796,232.94 | 20,663,530.97 | 96,047,085.42 | 551,171.00 | 10,457,382.96 | 3,110,764.81 | 3,298,299.59 | 222,621.01 | ... | 0.00 | 260.73 | 0.00 | 108,574,085.97 | 113,725,717.00 | 8,204,910.50 | 2,265,258.50 | 10,846,446.75 | 285,025.74 | 322.00 |
| max | 479,519,065,452.04 | 485,505,503,066.43 | 57,494,932,398.89 | 34,617,435,553.84 | 70,730,995,658.83 | 14,320,855,982.51 | 3,652,341,993.22 | 3,859,296,288.00 | 3,871,315,449.00 | 464,557,853.00 | ... | 124,497,907.00 | 464,557,853.00 | 44,168,776.00 | 242,616,739,570.34 | 262,056,064,649.71 | 31,446,727,757.96 | 34,144,111,833.19 | 65,590,839,591.15 | 13,499,424,000.00 | 202,281.00 |
8 rows × 27 columns
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| COUNTRY_EXPORT | 12415 | 230 | KOREA | 159 |
| MIN_NET_MASS_KGS | 12488 | 3104 | 0.01 | 2669 |
| MAX_NET_MASS_KGS | 12488 | 6193 | 99000.0 | 220 |
Check for null columns:
MONTH_YEAR 0 COUNTRY_EXPORT 73 SUM_DUTIABLE_VALUE_PHP 0 SUM_VAT_BASE 0 SUM_VAT_PAID 0 SUM_DUTY_PAID 0 SUM_DUTIES_AND_TAXES 0 SUM_EXCISE_ADVALOREM_PAID 727 SUM_NET_MASS_KGS 0 AVG_DUTIABLE_VALUE_PHP 0 AVG_VAT_BASE 0 AVG_VAT_PAID 0 AVG_DUTY_PAID 0 AVG_DUTIES_AND_TAXES 0 AVG_EXCISE_ADVALOREM_PAID 727 AVG_NET_MASS_KGS 0 MIN_DUTIABLE_VALUE_PHP 0 MIN_VAT_BASE 0 MIN_VAT_PAID 0 MIN_DUTY_PAID 0 MIN_DUTIES_AND_TAXES 0 MIN_EXCISE_ADVALOREM_PAID 727 MIN_NET_MASS_KGS 0 MAX_DUTIABLE_VALUE_PHP 0 MAX_VAT_BASE 0 MAX_VAT_PAID 0 MAX_DUTY_PAID 0 MAX_DUTIES_AND_TAXES 0 MAX_EXCISE_ADVALOREM_PAID 727 MAX_NET_MASS_KGS 0 COUNT_ROWS 0 dtype: int64
Display % of null columns:
MONTH_YEAR 0.00 COUNTRY_EXPORT 0.58 SUM_DUTIABLE_VALUE_PHP 0.00 SUM_VAT_BASE 0.00 SUM_VAT_PAID 0.00 SUM_DUTY_PAID 0.00 SUM_DUTIES_AND_TAXES 0.00 SUM_EXCISE_ADVALOREM_PAID 5.82 SUM_NET_MASS_KGS 0.00 AVG_DUTIABLE_VALUE_PHP 0.00 AVG_VAT_BASE 0.00 AVG_VAT_PAID 0.00 AVG_DUTY_PAID 0.00 AVG_DUTIES_AND_TAXES 0.00 AVG_EXCISE_ADVALOREM_PAID 5.82 AVG_NET_MASS_KGS 0.00 MIN_DUTIABLE_VALUE_PHP 0.00 MIN_VAT_BASE 0.00 MIN_VAT_PAID 0.00 MIN_DUTY_PAID 0.00 MIN_DUTIES_AND_TAXES 0.00 MIN_EXCISE_ADVALOREM_PAID 5.82 MIN_NET_MASS_KGS 0.00 MAX_DUTIABLE_VALUE_PHP 0.00 MAX_VAT_BASE 0.00 MAX_VAT_PAID 0.00 MAX_DUTY_PAID 0.00 MAX_DUTIES_AND_TAXES 0.00 MAX_EXCISE_ADVALOREM_PAID 5.82 MAX_NET_MASS_KGS 0.00 COUNT_ROWS 0.00 dtype: float64
# Drop country_export with null values
df_summary_year_cntry.dropna(subset='COUNTRY_EXPORT', inplace=True)
df_summary_year_cntry.isna().sum()
MONTH_YEAR 0 COUNTRY_EXPORT 0 SUM_DUTIABLE_VALUE_PHP 0 SUM_VAT_BASE 0 SUM_VAT_PAID 0 SUM_DUTY_PAID 0 SUM_DUTIES_AND_TAXES 0 SUM_EXCISE_ADVALOREM_PAID 727 SUM_NET_MASS_KGS 0 AVG_DUTIABLE_VALUE_PHP 0 AVG_VAT_BASE 0 AVG_VAT_PAID 0 AVG_DUTY_PAID 0 AVG_DUTIES_AND_TAXES 0 AVG_EXCISE_ADVALOREM_PAID 727 AVG_NET_MASS_KGS 0 MIN_DUTIABLE_VALUE_PHP 0 MIN_VAT_BASE 0 MIN_VAT_PAID 0 MIN_DUTY_PAID 0 MIN_DUTIES_AND_TAXES 0 MIN_EXCISE_ADVALOREM_PAID 727 MIN_NET_MASS_KGS 0 MAX_DUTIABLE_VALUE_PHP 0 MAX_VAT_BASE 0 MAX_VAT_PAID 0 MAX_DUTY_PAID 0 MAX_DUTIES_AND_TAXES 0 MAX_EXCISE_ADVALOREM_PAID 727 MAX_NET_MASS_KGS 0 COUNT_ROWS 0 dtype: int64
# Read the dataset and display head
sql = """
SELECT *
FROM chapters
"""
df_chapters = pd.read_sql(sql, conn)
# Preview data
display(df_chapters.head())
# Get the dimensions of the df
# Get the dimensions of the df
print(f'Number of rows: {df_chapters.shape[0]}\n'
f'Number of columns: {df_chapters.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
display(df_chapters.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_chapters.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_chapters.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(df_chapters.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(df_chapters.isna().sum() / (len(df_chapters))*100)
| HSCODE_2 | CHAPTER | |
|---|---|---|
| 0 | 01 | Live animals |
| 1 | 02 | Meat and edible meat offal |
| 2 | 03 | Fish and crustaceans, molluscs and other aquatic invertebrates |
| 3 | 04 | Dairy produce; birds' eggs; natural honey; edible products of animal origin, not elsewhere specified or included |
| 4 | 05 | Products of animal origin, not elsewhere specified or included |
Number of rows: 97 Number of columns: 2 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 97 entries, 0 to 96 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HSCODE_2 97 non-null object 1 CHAPTER 97 non-null object dtypes: object(2) memory usage: 1.6+ KB
None
Summary statistics for numerical columns:
| HSCODE_2 | CHAPTER | |
|---|---|---|
| count | 97 | 97 |
| unique | 97 | 97 |
| top | 01 | Live animals |
| freq | 1 | 1 |
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| HSCODE_2 | 97 | 97 | 01 | 1 |
| CHAPTER | 97 | 97 | Live animals | 1 |
Check for null columns:
HSCODE_2 0 CHAPTER 0 dtype: int64
Display % of null columns:
HSCODE_2 0.00 CHAPTER 0.00 dtype: float64
Refer to Notes below for notes about the completeness of the data.
Rice Importation
# Create a df to store yearly importation of rice
df_rice_imports = pd.read_sql("""
SELECT substr(month_year,1,4) as YR, COUNTRY_EXPORT,
SUM(SUM_NET_MASS_KGS)/1000 as SUM_NETMASS_MT
FROM summary
WHERE substr('0000000000'||HS_CODE, -11, 11) like '1006%'
GROUP BY YR, COUNTRY_EXPORT
""", conn)
# Preview data
display(df_rice_imports.head())
# Get the dimensions of the df
print(f'Number of rows: {df_rice_imports.shape[0]}\n'
f'Number of columns: {df_rice_imports.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
display(df_rice_imports.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_rice_imports.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_rice_imports.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(df_rice_imports.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(df_rice_imports.isna().sum() / (len(df_rice_imports))*100)
| YR | COUNTRY_EXPORT | SUM_NETMASS_MT | |
|---|---|---|---|
| 0 | 2015 | None | 2,500.00 |
| 1 | 2015 | BRAZIL | 0.33 |
| 2 | 2015 | CHINA | 5,702.03 |
| 3 | 2015 | INDIA | 11,342.79 |
| 4 | 2015 | INDONESIA | 0.17 |
Number of rows: 131 Number of columns: 3 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 131 entries, 0 to 130 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 YR 131 non-null object 1 COUNTRY_EXPORT 126 non-null object 2 SUM_NETMASS_MT 131 non-null float64 dtypes: float64(1), object(2) memory usage: 3.2+ KB
None
Summary statistics for numerical columns:
| SUM_NETMASS_MT | |
|---|---|
| count | 131.00 |
| mean | 130,655.48 |
| std | 424,588.40 |
| min | 0.00 |
| 25% | 1.75 |
| 50% | 300.00 |
| 75% | 19,840.02 |
| max | 2,521,693.07 |
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| YR | 131 | 8 | 2021 | 23 |
| COUNTRY_EXPORT | 126 | 31 | INDIA | 8 |
Check for null columns:
YR 0 COUNTRY_EXPORT 5 SUM_NETMASS_MT 0 dtype: int64
Display % of null columns:
YR 0.00 COUNTRY_EXPORT 3.82 SUM_NETMASS_MT 0.00 dtype: float64
# Drop country_export with null values
df_rice_imports.dropna(subset='COUNTRY_EXPORT', inplace=True)
df_rice_imports.isna().sum()
YR 0 COUNTRY_EXPORT 0 SUM_NETMASS_MT 0 dtype: int64
# Create a df to store top countries where PH imports rice
topn_rice = 40
top_rice_importers = (df_rice_imports.groupby(['YR', 'COUNTRY_EXPORT'])
.sum()
.nlargest(topn_rice, 'SUM_NETMASS_MT'))
list_top_rice_countries = (top_rice_importers.reset_index()['COUNTRY_EXPORT']
.unique().tolist())
paramlist = r'?'
for i in range(1, len(list_top_rice_countries)):
paramlist = paramlist + r', ?'
sql = f"""
SELECT substr(month_year,1,4) as YR, COUNTRY_EXPORT,
SUM(SUM_NET_MASS_KGS)/1000 as SUM_NETMASS_MT,
SUM(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
SUM(SUM_DUTIABLE_VALUE_PHP+SUM_DUTIES_AND_TAXES)/SUM(SUM_NET_MASS_KGS)
as SUM_DV_DT_PER_KGS
FROM summary
WHERE substr('0000000000'||HS_CODE, -11, 11) like '1006%'
AND country_export in ({paramlist})
GROUP BY YR, country_export
"""
top_countries_per_yr = pd.read_sql(sql, conn, params=list_top_rice_countries)
# Preview data
display(top_countries_per_yr.head())
# Get the dimensions of the df
print(f'Number of rows: {top_countries_per_yr.shape[0]}\n'
f'Number of columns: {top_countries_per_yr.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
display(top_countries_per_yr.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(top_countries_per_yr.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(top_countries_per_yr.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(top_countries_per_yr.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(top_countries_per_yr.isna().sum() / (len(top_countries_per_yr))*100)
| YR | COUNTRY_EXPORT | SUM_NETMASS_MT | SUM_DUTIES_AND_TAXES | SUM_DV_DT_PER_KGS | |
|---|---|---|---|---|---|
| 0 | 2015 | CHINA | 5,702.03 | 525,216.00 | 145.36 |
| 1 | 2015 | INDIA | 11,342.79 | 30,228,947.00 | 105.98 |
| 2 | 2015 | SINGAPORE | 0.12 | 3,454.00 | 411.80 |
| 3 | 2015 | THAILAND | 730,930.80 | 2,125,989,633.00 | 21.10 |
| 4 | 2015 | VIET NAM | 953,476.43 | 1,792,409,848.00 | 19.99 |
Number of rows: 52 Number of columns: 5 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 52 entries, 0 to 51 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 YR 52 non-null object 1 COUNTRY_EXPORT 52 non-null object 2 SUM_NETMASS_MT 52 non-null float64 3 SUM_DUTIES_AND_TAXES 52 non-null float64 4 SUM_DV_DT_PER_KGS 52 non-null float64 dtypes: float64(3), object(2) memory usage: 2.2+ KB
None
Summary statistics for numerical columns:
| SUM_NETMASS_MT | SUM_DUTIES_AND_TAXES | SUM_DV_DT_PER_KGS | |
|---|---|---|---|
| count | 52.00 | 52.00 | 52.00 |
| mean | 327,942.76 | 1,661,019,019.87 | 58.61 |
| std | 627,462.98 | 3,803,797,068.29 | 68.05 |
| min | 0.12 | 0.00 | 3.38 |
| 25% | 10,745.61 | 27,882,929.73 | 22.38 |
| 50% | 61,853.48 | 266,831,101.32 | 25.63 |
| 75% | 277,984.47 | 1,012,433,798.02 | 85.87 |
| max | 2,521,693.07 | 16,415,236,247.53 | 411.80 |
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| YR | 52 | 8 | 2017 | 7 |
| COUNTRY_EXPORT | 52 | 7 | CHINA | 8 |
Check for null columns:
YR 0 COUNTRY_EXPORT 0 SUM_NETMASS_MT 0 SUM_DUTIES_AND_TAXES 0 SUM_DV_DT_PER_KGS 0 dtype: int64
Display % of null columns:
YR 0.00 COUNTRY_EXPORT 0.00 SUM_NETMASS_MT 0.00 SUM_DUTIES_AND_TAXES 0.00 SUM_DV_DT_PER_KGS 0.00 dtype: float64
PREFERENTIAL_CODE pertains to codes used to refer to importations from countries with free trade agreements with the Philippines. Importations with these codes pay lower or zero duties. This is only available for selected transactions and countries.
Not all importations are required to pay an excise tax, so it's normal for PREFERENTIAL_CODE to have null values. Therefore, no additional clean-up or data transformation is required to address null/incomplete Preferential Code and Excise Ad Valorem tax values in the data frame since those columns are optional or on an as-applicable basis.
On the other hand, records with NULL values in the COUNTRY_EXPORT column of the data frame were removed because there's no other reliable source to identify which country the record belongs to. The COUNTRY_ORIGIN field which sometimes contains the 2-character code of the country cannot be used. Exploring this column showed that many records contain the same COUNTRY_ORIGIN but different COUNTRY_EXPORT values. Moreover, the total annual dutiable values of these records with NULL COUNTRY_EXPORT is less than 1% of the total dutiable value of imports for each year.
On another note, since the files were too large to save to a data frame at once (>6 GB), the team had to create different data frames for specific grouping by categories as itemized and processed above (i.e., Chapters and Tariff Codes, Rice Importation, and Census).
# Read the dataset and display head
sql2 = """
SELECT *
FROM census
"""
df_census = pd.read_sql(sql2, conn2)
# Preview data
display(df_census.head())
# Get the dimensions of the df
print(f'Number of rows: {df_census.shape[0]}\n'
f'Number of columns: {df_census.shape[1]}\n')
# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
'the df:\n')
display(df_census.info())
# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_census.describe())
# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_census.describe(include=[object]).T)
# Get count of null values
print(f'Check for null columns:\n')
display(df_census.isna().sum())
# display % of null values per column
print(f'Display % of null columns:\n')
display(df_census.isna().sum() / (len(df_census))*100)
| LOCATION | POPULATION_2015-8 | POPULATION_2020-5 | |
|---|---|---|---|
| 0 | PHILIPPINES | 100,981,437.00 | 109,035,343.00 |
| 1 | NATIONAL CAPITAL REGION (NCR) | 12,877,253.00 | 13,484,462.00 |
| 2 | CITY OF MANILA | 1,780,148.00 | 1,846,513.00 |
| 3 | CITY OF MANDALUYONG | 386,276.00 | 425,758.00 |
| 4 | CITY OF MARIKINA | 450,741.00 | 456,059.00 |
Number of rows: 135 Number of columns: 3 List of all columns, count of non-null values, and datatypes of the df: <class 'pandas.core.frame.DataFrame'> RangeIndex: 135 entries, 0 to 134 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 LOCATION 135 non-null object 1 POPULATION_2015-8 135 non-null float64 2 POPULATION_2020-5 135 non-null float64 dtypes: float64(2), object(1) memory usage: 3.3+ KB
None
Summary statistics for numerical columns:
| POPULATION_2015-8 | POPULATION_2020-5 | |
|---|---|---|
| count | 135.00 | 135.00 |
| mean | 2,244,000.32 | 2,422,976.54 |
| std | 8,847,721.60 | 9,558,833.63 |
| min | 17,246.00 | 18,831.00 |
| 25% | 388,495.50 | 427,890.00 |
| 50% | 689,668.00 | 728,402.00 |
| 75% | 1,677,498.50 | 1,776,898.00 |
| max | 100,981,437.00 | 109,035,343.00 |
Summary statistics for object(string) columns:
| count | unique | top | freq | |
|---|---|---|---|---|
| LOCATION | 135 | 135 | PHILIPPINES | 1 |
Check for null columns:
LOCATION 0 POPULATION_2015-8 0 POPULATION_2020-5 0 dtype: int64
Display % of null columns:
LOCATION 0.00 POPULATION_2015-8 0.00 POPULATION_2020-5 0.00 dtype: float64
NOTE: The census dataset is complete and without any null values, so no additional data transformation required.
# Display the importation data
df_imports_agg.head()
| MONTH_YEAR | HS_CODE | PREFERENTIAL_CODE | SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | ... | MIN_EXCISE_ADVALOREM_PAID | MIN_NET_MASS_KGS | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | MAX_NET_MASS_KGS | COUNT_ROWS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-1 | 10019019000 | None | 2,413,111,061.00 | 2,440,378,069.00 | 0.00 | 0.00 | 0.00 | NaN | 153,329,722.00 | ... | NaN | 1000787.0 | 369,253,676.00 | 372,889,558.00 | 0.00 | 0.00 | 0.00 | NaN | 8006296.0 | 20 |
| 1 | 2015-1 | 10019099000 | None | 1,448,184,761.00 | 1,571,805,522.00 | 0.00 | 101,372,917.00 | 101,372,917.00 | NaN | 125,329,720.00 | ... | NaN | 0.0 | 216,549,708.00 | 234,706,251.00 | 0.00 | 15,158,479.00 | 15,158,479.00 | NaN | 9460325.0 | 32 |
| 2 | 2015-1 | 10019099000 | AIFTA | 18,672,091.00 | 19,794,256.00 | 0.00 | 746,881.00 | 746,881.00 | NaN | 2,005,240.00 | ... | NaN | 229540.0 | 6,410,765.00 | 6,789,302.00 | 0.00 | 256,430.00 | 256,430.00 | NaN | 691410.0 | 5 |
| 3 | 2015-1 | 10019099000 | ANFTA | 262,946,606.00 | 266,354,909.00 | 0.00 | 0.00 | 0.00 | NaN | 20,094,214.00 | ... | NaN | 16129544.0 | 206,744,629.00 | 209,332,028.00 | 0.00 | 0.00 | 0.00 | NaN | 99390.0 | 13 |
| 4 | 2015-1 | 10019919000 | None | 77,393,616.00 | 78,178,366.00 | 9,381,401.00 | 0.00 | 9,381,401.00 | NaN | 4,470,871.00 | ... | NaN | 406149.0 | 17,263,204.00 | 17,433,569.00 | 2,092,028.00 | 0.00 | 2,092,028.00 | NaN | 989386.0 | 7 |
5 rows × 32 columns
Update MONTH_YEAR column to datetime format. Then create a new column named HSCODE_2 where we get the first 2 characters of the HS_CODE which correspond to the chapter an item belongs to. And finally, merge the imports dataframe with the df_chapters dataframe:
df_imports_agg['MONTH_YEAR'] = (pd.to_datetime(
df_imports_agg['MONTH_YEAR'], format='%Y-%m'))
df_imports_agg['HSCODE_2'] = df_imports_agg['HS_CODE'].str[:2]
df_imports_agg = df_imports_agg.merge(df_chapters, on='HSCODE_2')
df_imports_agg.head()
| MONTH_YEAR | HS_CODE | PREFERENTIAL_CODE | SUM_DUTIABLE_VALUE_PHP | SUM_VAT_BASE | SUM_VAT_PAID | SUM_DUTY_PAID | SUM_DUTIES_AND_TAXES | SUM_EXCISE_ADVALOREM_PAID | SUM_NET_MASS_KGS | ... | MAX_DUTIABLE_VALUE_PHP | MAX_VAT_BASE | MAX_VAT_PAID | MAX_DUTY_PAID | MAX_DUTIES_AND_TAXES | MAX_EXCISE_ADVALOREM_PAID | MAX_NET_MASS_KGS | COUNT_ROWS | HSCODE_2 | CHAPTER | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 | 10019019000 | None | 2,413,111,061.00 | 2,440,378,069.00 | 0.00 | 0.00 | 0.00 | NaN | 153,329,722.00 | ... | 369,253,676.00 | 372,889,558.00 | 0.00 | 0.00 | 0.00 | NaN | 8006296.0 | 20 | 10 | Cereals |
| 1 | 2015-01-01 | 10019099000 | None | 1,448,184,761.00 | 1,571,805,522.00 | 0.00 | 101,372,917.00 | 101,372,917.00 | NaN | 125,329,720.00 | ... | 216,549,708.00 | 234,706,251.00 | 0.00 | 15,158,479.00 | 15,158,479.00 | NaN | 9460325.0 | 32 | 10 | Cereals |
| 2 | 2015-01-01 | 10019099000 | AIFTA | 18,672,091.00 | 19,794,256.00 | 0.00 | 746,881.00 | 746,881.00 | NaN | 2,005,240.00 | ... | 6,410,765.00 | 6,789,302.00 | 0.00 | 256,430.00 | 256,430.00 | NaN | 691410.0 | 5 | 10 | Cereals |
| 3 | 2015-01-01 | 10019099000 | ANFTA | 262,946,606.00 | 266,354,909.00 | 0.00 | 0.00 | 0.00 | NaN | 20,094,214.00 | ... | 206,744,629.00 | 209,332,028.00 | 0.00 | 0.00 | 0.00 | NaN | 99390.0 | 13 | 10 | Cereals |
| 4 | 2015-01-01 | 10019919000 | None | 77,393,616.00 | 78,178,366.00 | 9,381,401.00 | 0.00 | 9,381,401.00 | NaN | 4,470,871.00 | ... | 17,263,204.00 | 17,433,569.00 | 2,092,028.00 | 0.00 | 2,092,028.00 | NaN | 989386.0 | 7 | 10 | Cereals |
5 rows × 34 columns
OBJECTIVE:
The objective of this EDA is to identify outliers, trends, and patterns, and provide insights regarding the Philippine Importations from 2015 to 2022, especially those years during PRRD's Term, and to determine imported rice consumption patterns of Filipinos between 2015 (before RTL) and 2020 (a year after RTL implementation).
QUESTIONS:
1. Are there outliers in the importation data?
2. What are the trends involving Philippine importation from 2015 to 2022?
2.1 Yearly
2.2 Monthly
3. Which chapters dominate the Philippine importation scene? Which chapters are weak?
3.1 During PRRD's Term (June 2016 to June 2022)
3.2 Yearly (from 2015 to 2022)
4. Which countries dominate the Philippine importation scene?
Are there noticeable patterns in the countries where we typically import?
4.1 During PRRD's Term (June 2016 to June 2022)
4.2 Yearly (from 2015 to 2022)
5. Are there significant changes in imported rice consumption and trends before
and after RTL implementation?
Outliers were identified in the imports data. The process of identification and removal of the outliers were performed in "Customs-outliers.ipynb" file.
Out of all the identified outliers, only three items were considered invalid. The individual importation records for those three items were determined to be invalid because those items were recorded at above PHP 1 trillion dutiable value for a single declaration which have never happened between 2015 to Sept 2022.
The invalid rows removed from the dataset are shown below:
outliers()
| MONTH_YEAR | HS_CODE | ACTUAL_DUTY | COUNTRY_EXPORT | COUNTRY_ORIGIN | CURRENCY | DUTIABLE_VALUE_FOREIGN | DUTIABLE_VALUE_PHP | DUTIES_AND_TAXES | DUTY_PAID | EXCHANGE_RATE | EXCISE_ADVALOREM_PAID | GOODS_DESCRIPTION | NET_MASS_KGS | PREFERENTIAL_CODE | VAT_BASE | VAT_PAID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-5 | 85331090000 | None | JAPAN | JP | USD | 85331090061.0 | 4,481,759,512,184.00 | 0.00 | 0.00 | 52.522 | 0.00 | RESISTOR | 5.3 | None | 4,487,361,717,889.00 | 538,483,406,146.00 |
| 1 | 2021-3 | 73182990000 | None | GERMANY, FEDERAL REPUBLIC OF | DE | USD | 284808279857.97 | 13,862,473,405,527.00 | 3,217,563,438,244.03 | 1,386,247,340,552.70 | 48.673 | 0.00 | SCREW | 0.5 | None | 15,260,967,480,761.10 | 1,831,316,097,691.33 |
| 2 | 2021-3 | 73269099000 | None | GERMANY, FEDERAL REPUBLIC OF | DE | USD | 117771598642.4 | 5,732,297,020,721.54 | 1,652,371,150,129.57 | 859,844,553,108.23 | 48.673 | 0.00 | THREAD INSERT | 0.5 | None | 6,604,388,308,511.18 | 792,526,597,021.34 |
trends_yearly()
INSIGHTS:
The dutiable value of Philippine imports had a stable increase of 10% from 2016 to 2018. It also had a steady dutiable value of at least Php 6.1 trillion from 2018 up to the first half of 2022. The dutiable value of Philippine imports from other countries had a stable increase of 10% from 2016 to 2018. There's a little decline in 2020 due to lockdown protocols implemented during the peak of the pandemic. A slight increase in 2021 was brought about by the delays on recording from the previous year.
The massive increase in dutiable value did not magnify the trend in duties and taxes.
Assessed duties and taxes of dutiable imports increased massively starting in 2020. The 8-9% duties and taxes rate from 2016-2019 increased to 15-16% from 2020 onwards. This was mainly due to the implementation of online filing starting in 2020 to deliberately record the correct duties and taxes of importations.
Interestingly, this period when BOC had increased duties and taxes was also former Commissioner Rey "Jagger" Guerrero and his reform team's time in the Bureau of Customs. He is a straight arrow and a true reformist known for his integrity and professionalism in making the right things done. It was when BOC started its "Transformation Program" and enrolled in the Performance Governance System of the Institute for Solidarity in Asia from which they got the prestigious Island of Good Governance distinction after passing all four stages of the system with Gold Trailblazer awards.
trends_monthly()
INSIGHTS:
The monthly total dutiable value of imports had a decline coinciding with the start of the lockdown. However, it reached an all-time high a few months later. Upon inspection of the records, the sudden peak in total importation values can be attributed to electronics, specifically laptops. This was the time when offices were shifting to a work-from-home setup and some schools enforced online learning.
Since the pandemic, the trend of the monthly total dutiable values of imports had been steadily increasing. For the last 4 months (June-Sept 2022), the monthly dutiable value was above 700 billion PHP. This threshold has only been surpassed four other times before (Nov 2016, March 2019, July 2020, and March 2022).
The biggest spikes in dutiable value didn't reflect to the duties and taxes. This means that the items that caused the spikes had tax exemptions. This can be seen in the effective rate of duties and taxes over the dutiable value graph where the dips in percentages coincide with the peaks in dutiable value.
Before the pandemic, the effective rate of duties and taxes over dutiable value was consistently playing at around less than 10%. It spiked at the start of the pandemic, and presently, it seems to have stabilized at around 15%.
BOC shifted to online filing of goods declaration and supporting documents when the COVID-19 pandemic started in March 2020. BOC banked on the new IT online systems to reduce face-to-face interaction and subsequently curb graft and corruption when transacting with the Bureau. The changes in the IT systems and processes of BOC implemented under the administration of former BOC Commissioner Rey Leonardo B. Guerrero seem to have contributed to the significant increase in duties and taxes of importations since March 2020 since all transactions are online and can be seen in the head office's Commissioner's dashboard immediately. The upward trend of the charts above clearly shows the remarkable improvement in the assessment of duties and taxes since 2019.
prrd_top_bot_chapters()
INSIGHTS:
top5_compare_chapters()
INSIGHTS:
yearly_top20_chapters()
INSIGHTS
compare_top2_chapters()
INSIGHTS
bottom5_compare_chapters()
INSIGHTS:
yearly_bottom5_chapters()
INSIGHTS:
prrd_top_countries()
sum of dutiable value during PRRD's Term = 36.60 trillion sum of duties and taxes during PRRD's Term = 4.10 trillion
| COUNTRY_EXPORT | SUM_DUTIABLE_VALUE_PHP | % OF TOTAL | |
|---|---|---|---|
| 0 | CHINA | 7,170,404,163,498.27 | 19.59 |
| 1 | JAPAN | 3,867,783,984,392.89 | 10.57 |
| 2 | KOREA | 3,274,610,539,045.60 | 8.95 |
| 3 | UNITED STATES | 2,580,633,510,576.17 | 7.05 |
| 4 | SINGAPORE | 2,236,211,172,285.96 | 6.11 |
| 5 | THAILAND | 2,205,012,917,816.22 | 6.02 |
| 6 | INDONESIA | 2,124,130,447,336.61 | 5.80 |
| 7 | HONG KONG | 1,774,062,107,696.08 | 4.85 |
| 8 | TAIWAN | 1,660,756,199,020.07 | 4.54 |
| 9 | MALAYSIA | 1,400,963,233,365.91 | 3.83 |
| COUNTRY_EXPORT | SUM_DUTIES_AND_TAXES | % OF TOTAL | |
|---|---|---|---|
| 0 | CHINA | 899,059,707,407.24 | 21.95 |
| 1 | KOREA | 396,657,896,234.21 | 9.68 |
| 2 | JAPAN | 360,157,525,006.46 | 8.79 |
| 3 | INDONESIA | 319,177,254,798.15 | 7.79 |
| 4 | THAILAND | 305,772,745,263.82 | 7.46 |
| 5 | SINGAPORE | 292,424,740,426.86 | 7.14 |
| 6 | MALAYSIA | 197,255,985,481.58 | 4.82 |
| 7 | UNITED STATES | 184,232,648,605.71 | 4.50 |
| 8 | VIET NAM | 139,382,163,316.48 | 3.40 |
| 9 | TAIWAN | 122,886,160,713.98 | 3.00 |
INSIGHTS:
Except for the United States, 9 out of the 10 countries with the highest dutiable value when it comes to imports are Asian countries. 4 out of these belong to ASEAN countries. These top 10 countries amounted to 77.31% of the total dutiable value imported during PRRD's term. Lastly, China is the biggest importer in terms of dutiable value which amounted to 19.59% of the total during Duterte's term.
Similar to dutiable values, 9 out of the 10 countries with highest duties and taxes from importing are Asian countries except for the US. 5 out of these also belong to the ASEAN countries. The top 10 countries contributed 78.53% of the total duties and taxes from imports during Duterte's term. Similarly, China is the highest tax payer which was assessed to have 21.95% of the total duties and taxes from imports during PRRD's term.
Hong Kong is among the top 10 countries in terms of dutiable value of imports, but is not included in the 10 highest payers of duties and taxes.
Vietnam is among the top 10 countries in terms of duties and taxes, but is not among the importers with highest dutiable value.
yearly_top15_countries_dv()
The above chart shows that the dutiable value of CHINA importations grew consistently from 2016 to 2020 but started decreasing slightly from 2021 onwards. It has remained the top importing country since 2016. It breached the 1 trillion PHP mark in 2018.
While the USA importations have not increased but maintained between 300 to 400 Billion PHP.
top15_countries_increases_dv()
JAPAN, the second highest importing country, is far below CHINA in terms of the total dutiable value of importations.
It is interesting to note also that the PH increased importations from CHINA, JAPAN, KOREA and HONG KONG in 2020, the start of the COVID-19 pandemic year, while the other top countries showed a decrease in importations.
yearly_top15_countries_dt()
rice_volume()
rice_dt()
The Rice Tariffication Law (RTL) which took effect in March 2019 replaced the quantitative restrictions on rice importations with 35% to 40% tariff. From the graphs above, rice importations from Viet Nam increased while rice importations from Thailand decreased. Alongside it, the duties and taxes of Viet Nam rice importations jumped 2.5 times from 4.5 Billion pesos to 12.6 Billion pesos after a year of implementing the RTL.
rice_costperkilo()
Since 2019, the importation cost of rice from Viet Nam has gone down compared to Thailand's rice importation cost.
The cost of rice importation ranged between 20 to 28 pesos per kilogram. However, rice from Pakistan and Myanmar registered even lower importation cost since 2017 and 2019, respectively.
rice_per_capita()
The above graph shows the Philippine rice importation in Metric Tons from 2015 to 2022 and the population count in 2015 and 2020. It's interesting to note that back in 2015, the ratio of rice importation versus population is around 16.85 kilograms of imported rice per person. But in 2020, this ratio has increased to 20.55 kilograms per person.
The team's aim in conducting this study was to identify the trends, patterns, and insights regarding the performance of the Philippine importation during PRRD's term. Based on the team's detailed analyses, several insights were gathered that provided answers to the team's questions as elaborated further below.
The Duterte administration was known for its fight against illegal drugs, criminality, and corruption. Also, when PRRD took office in 2016, he pledged to shift the Philippines away from the United States in favor of China and Russia. However, towards his last two years in office, he had a change of heart that can probably be dated to as early as June 2020 because of the amped-up assertiveness of Beijing in the South China Sea like the incidents involving hundreds of Chinese militia boats.
Data analysis of the importations from 2015 to 2022 revealed massive growth of importations from China which broke the 1 trillion-peso total dutiable value in 2018 and peaked in 2020 but started decreasing since then. On the other hand, the importations from the United States remained stable between the 300 to 500 billion annual total dutiable value. The total dutiable value of importations from Japan was a far second from China reaching only 800 billion pesos in 2020.
Although the Duterte administration shifted its foreign policy relations to China and away from the United States, Duterte edged away from it during the last two years of his term. The effect of this can be seen from the start of the decrease in the total dutiable value of importations from China since 2020.
It is also important to note that during Duterte’s term, more than three fourths of the total dutiable value from importations came from Asian countries. Among these countries are ASEAN-member countries like Thailand, Indonesia, Malaysia and Singapore. The United States is the only western country to be among the top importers in the Philippines. When it comes to the assessed amount of taxes and duties from importations during Duterte’s term, almost the same group of countries were observed with the exception of Vietnam. With importation value at 19.59% of the total during PRRD’s presidency, China ranks first with a total assessment of 21.95% of the whole duties and taxes assessed from imports during Duterte’s term.
The Philippine importations during the first and second half of PRRD's term showed an opposite trend between dutiable value and duties and taxes. From 2016 to 2018, the total dutiable value had an average growth rate of 25% but the total duties and taxes remained flat at 14%. However, from 2019 to 2022, the total dutiable value showed an average decrease of 1%, while the total duties and taxes posted a 19% increase.
Based on the team's analysis, the importations during the first half of PRRD's term were attributed largely to the significant increase in the importations from China. Since the Philippines has a free trade agreement with China, the majority of the goods imported from China were exempted from paying duties which therefore drove the effective rate of duties and taxes down. On the other hand, the increasing and all-time-high breaches in total duties and taxes assessed were mainly due to the appointment of the new BOC Commissioner Guerrero in 2019 amidst the COVID-19 pandemic. Guerrero's implementation of his 10-point priority program that included fully automating the transactions in the frontline and the pandemic that made it urgent for everyone to immediately embrace the online filing and processing initiatives of the BOC resulted in a notable and remarkable increase in the total duties and tax assessments and reduction in graft and corruption. Such a turn of events also resulted in another milestone that was reached where the total duties and taxes assessed after June 2020 breached 68 billion and never went below it again. This is an example of digitalization initiatives that the next term should prioritize and capitalize on to continue the positive and increasing trend in duties and taxes that the government collects.
In terms of chapters/groups of products being imported, the electrical machinery and mineral fuels chapters have consistently bagged the top two spots since 2015. Electrical machinery was largely imported from China so the effective duties and taxes were lower than that of the mineral fuels since the latter had higher duties and excise tax rates, in addition to VAT.
The Rice Tariffication Law also took effect during PRRD’s administration in March 2019. It effectively changed the quantitative restrictions on rice importation to a 35%-40% tariff rate. It also created the Rice Competitiveness Enhancement Fund (RCEF) or Rice Fund with a P10 billion annual appropriation for the next six years to be allocated and disbursed for rice farm machinery and equipment, rice seed development, propagation, and promotion, expanded rice credit assistance and rice extension services.
The analysis of rice importations showed that the volume increased 2.5 times since 2018 and annual total duties and taxes exceeded 10 billion pesos which are more than enough to support the RCEF or Rice Fund. Most of the imported rice came from Viet Nam totaling 2.5 million Metric Tons in 2021. On the other hand, there was a significant drop in the volume of imported rice from Thailand after 2018.
Relating rice importations with Philippine census data to the total population in 2015 and 2020, the ratio of imported rice per person increased from just 16.85 kilograms to 20.55 kilograms. The analysis also showed that although the cost of imported rice started decreasing since 2020, it is still above 23 pesos per kilogram. Pakistan and Myanmar have lower costs of imported rice that reached 20 pesos per kilogram.
Considering the varying insights obtained from the analyses above, the team has formed the following recommendations:
Data Quality: The BOC should consider setting stricter and more robust validation checks in the online goods declaration form. The BOC should put validation checks for each importation record being submitted by the importer (e.g., ensure that no blank inputs for required fields (such as COUNTRY_EXPORT), standardize country selection for both COUNTRY_EXPORT (full name) and COUNTRY_ORIGIN (2 characters), and no individual importation should exceed 1 trillion PHP (or the max allowable import declaration), among others).
Customs Memorandum Orders: The BOC should evaluate customs memorandum orders carefully, especially if they affect electronic shipments.
Continuity of projects with the new administration: The BOC should continue and improve the online filing of goods declaration since it had proven to be successful and has led to almost double the collection of duties and taxes than before such initiatives were implemented. It has also led to a reduction of corruption in the bureau.
Application Programming Interface (API): The BOC should consider using an API for sharing importation data to external parties so that the data correction and updates will be reflected in real-time. For data files already on the website, BOC should put data field description and notes for each monthly file. Also, historical files should be regularly updated to reflect data corrections. Date time fields like the date of declaration/entry, date of assessment, and date of payment should be added back just like in the previous years to allow more explorative analysis of time series data. Port codes should also be put back to allow the analysis of port importation performance.
Rice importation source: Encourage rice importation from Pakistan and Myanmar which have significantly lower effective importation costs than Viet Nam and Thailand rice shipments. This may further drive down the cost of rice in the Philippines. Also, the country needs to diversify its source of imported rice rather than relying entirely on Viet Nam.
Boost trade with countries other than China: Trade imports from Japan and Korea should be further boosted to get closer to China's figures on the dutiable value of imports.
Focus on the top 10 chapters: Knowing that at least 70% of the total dutiable value and duties and taxes are coming from the top 10 chapters, the relevant government agencies should prioritize thinking about how to potentially increase duties and taxes collection by studying those that were on the top 10 chapters based on dutiable value, but were not in the top 10 in terms of duties and taxes assessed. Based on such a trend in the top chapters, in combination with the export data, the government should also focus on encouraging investors to set up manufacturing and trading sites in the Philippines by including the industries affected in the Strategic Investment Priority Plan and others, as applicable.
Redirect importations: The relevant government agencies should consider proactively studying and assessing the statistics and trends in the effective cost of imported goods. If there are cheaper, same quality sources of imported goods, the DTI, or any other agencies should encourage the importers to source from those countries instead to help drive down the cost of products in the Philippines.
Although the team is confident about the findings in this report, it is important to address our assumptions and this study's limitations.
Based on our research, we found out that the publicly available customs importation dataset includes values that are based on the assessment date and information. Those columns with "PAID" suffixes in dutiable value, duties, taxes, VAT, and other columns do not actually mean paid but rather the amount that has been assessed based on the assessment date. Such an amount will only be paid after a certain number of days from the assessment date.
Moreover, though the team would have preferred to compare same-period datasets between census and customs importation data, the relevant census data that were available and were extracted only includes 2015 and 2020 values. In that case, we were only able to compare the 2015 and 2020 census data with customs importation data.
Lastly, for ease in comparing periodic trends, the team decided to use only the Philippine Peso value of the importations. Hence, this study excludes the impact of inflation and changes in exchange rates.